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Doctrine and Experience: Their Influence 
Upon the Psychotherapist 


William F. Fey 
Department of Psychiatry, University of Wisconsin 


The parameters of the psychotherapeutic 
experience have proved difficult to isolate and 
quantify. Admitted differences among thera- 
pists’ behaviors are known to be great (2, 3, 
4, 5, 6, 7, 8, 9), and actual differences are 
doubtless still greater. The most obvious ques- 
tion about psychotherapy—‘What happens 
during the hour?”—we are least equipped 
with facts to answer. Transcripts of therapy 
lack the nuances of sound and feeling; re- 
corded tapes neglect important visual clues. 
Motion pictures require elaborate, unnatural 
technical settings and are virtually nonexist- 
ent; and a panel of qualified but unbiased ob- 
servers is difficult even to conceive. Moreover, 
knowledge that the process is being observed 
or recorded at all is said to inhibit and trans- 
mute the experience. The principals them- 
selves, patient and therapist, are least objec- 
tive of all, for their reports, while firsthand, 
are inextricably colored by the powerful com- 
plex of needs which has produced and sus- 
tains their highly unique relationship. 

Any simple report of what happens be- 
tween patient and therapist omits fundamen- 
tal information since, for example, one’s man- 
ner in refusing a gift is probably more impor- 
tant in the treatment process than is the fact 
of refusing it. But the fact and manner are 
not unrelated, and the patterning of facts 
ascribable to a therapist doubtless reflects 
something of his manner and his basic atti- 
tudes toward patients. Therefore, despite their 
obvious limitation as primary data, the pres- 
ent study is based upon therapists’ reports of 
their responses to patients. The assumptions 
are made that however subtle and refined the 
underlying rationale of therapeutic strategy 


may be, it must eventually be given expres- 
sion in behavior, and that intensive scrutiny 
of such behavior will afford glimpses of basic 
therapeutic philosophy. For the present study, 
then, the working null hypotheses are these: 
that differences in doctrine and experience 
among therapists are unrelated both to their 
direct reports of behavior with patients and 
to the underlying attitudes which the pattern- 
ing of these behaviors might suggest. 


Method 


A simple questionnaire composed of issues 
commonly arising in the conduct of psycho- 
therapy was sent to each person in a mid- 
western university city who was known to be 
conducting consecutive interview treatment. 
The questionnaire items appear in Table 1. 
For the first 27 items, the scale of answers 
ranged from 1 (“yes” or “always” or “rou- 
tinely”) through 5 (“no” or “never”). The 
final two items summarize the respondents’ 
tendencies in marking the questionnaire items 
themselves. 

The 36 therapists polled included 29 psy- 
chiatrists (18 of whom were board-certified), 
5 psychologists, and 2 psychiatric social work- 
ers. Of these, 34 (94%) responded, one re- 
plied to decline participation, and one failed 
to reply. Each identified his protocol only by 
stating the number of years he had been do- 
ing therapy and by indicating what theoreti- 
cal orientation, if any, he tended to observe. 
Doctrine and profession were known to be 
independent in this sample; the Rogerian 
group, for example, is not predominantly psy- 
chologists. These identifying variables are 
summarized in Table 2. 
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Table 1 


Questionnaire Items 


. Do you obtain from the patient a detailed case | 


history before beginning psychotherapy (interview 
therapy) ? 

. Would you accept a small gift from a patient in 
psychotherapy ? 

. Would the size of the gift influence your decision? 

. Ordinarily, do you answer general questions of fact 
such as, “Are psychiatrists always doctors?” 

. As therapist, do you ask your patient questions? 

. Assuming you were licensed, would you write (e.g., 
sedative) prescriptions for your patient? 

. Do you interview the relatives or friends of a pa- 
tient, whom you've decided to see in psychotherapy, 
for the purpose of obtaining more complete know!l- 
edge about the patient? 

. Do you ask the patient for dream materials? 

. Do you contact a patient who fails to appear or to 
notify you on two consecutive appointments? 

. Would you ever go to the patient’s home to do 
psychotherapy? 

. Do you make any efforts to ease the environmental 
stress upon your patient by contacting significant 
persons in his life? 

. Ordinarily, do you answer personal questions of 
fact from the patient such as, “Are you Catholic?” 
or “Are you married?” 

13. Are you willing (if you have the time) to discuss 
nonemergency issues with your patient on the tele- 
phone? 

14. Do you deliberately assume different therapeutic 
roles with different patients? 

15. Do you try to maintain the same therapeutic role 
throughout the entire course of therapy with a 
given patient? 


16. Ordinarily, do you answer personal questions of 
opinion such as, “Do you enjoy classical music?’’ 

17. In the course of therapy your patient decides to 
make an unpromising marriage. Would you make 
your doubts about it known to him? 

18. Would you deliberately avoid being in a social 
situation with your patient? 

19. If your patient asked you to talk with her husband 
about her condition, would you be willing? 

20. Typically, do you “interpret” your patient’s be- 
havior, in the sense of showing him its real signifi- 
cance—meanings of which he is unaware? 

21. Assuming that you have open hours, do you offer 
your patient a choice concerning the number of 
interviews per week? 

. Do you or would you employ hypnosis or drug 
preparations té aid your patient in producing mate- 
rial for psychotherapy? 

. Do you extend, beyond the usual time limits, an 
interview which is proving unusually productive? 

. Do you feel it is important not to “personalize” 
your office with photos, knicknacks, etc.? 

. If a regular patient, not in emergency, phones to 
request an extra, unscheduled interview, do you 
grant it? 

. Do you attempt to maintain a definite and con- 
sistent length of interview? 

. Ordinarily, would you reply frankly to your pa- 
tient’s question, “Do you think this (therapy) is 
helping me so far?” 

. How many years have you been doing therapy? 

. The number of midline or “3” answers to the items. 

. The number of extreme or “1” and “5” answers to 
the items. 


Results 


Although the sample of respondents well 
represents a particular locale, serious prob- 
lems in interpretation arise in that the sam- 
ple contains no very experienced Rogerians, 
and the Analytic group is both small and di- 
verse in range of experience. To facilitate the 
test of hypotheses, the total sample was di- 
vided into the following four groups: Roger- 
ians (N = 7), Analysts (NV = 5), Young Ec- 
lectics (those 8 therapists with fewer than 10 
years’ experience), and Older Eclectics (the 
remaining 14 eclectics with 10 or more years’ 
experience). 

The data were first examined with re- 
spect to homogeneity within doctrines, testing 
the expectation that, for example, Rogerians 
should answer the items more like other Ro- 


Table 2 
Respondents Grouped by Doctrine and Experience 


Years of 


experi- 
ence 


Doctrines 


Rogerian Analytic Eclectic Totals © 


1- 2 
4 1 


10-12 
13-15 
16-18 
19-21 
22-24 


Totals 
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2 9 
4 7 
2 2 
4 4 
1 4 5 
0 
-- 1 3 4 
3 3 
— 7 5 22 34 
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Table 3 
Correlations of Each Therapist with the Model or Average of Each Group* 


Models 


YEze. 


Models 


Thera- 
pist 


+39 
+24 
+24 
+52 
+27 
+04 
+21 


+32 
+28 
+56 
+42 
+33 
+34 
+31 
+32 


OEc. 


When df = 27,7 .37,9 < 5 47,9 < 01. 


gerians than like Eclectics. An informal meas- 
ure of such homogeneity is to be found in the 
correlation of each therapist’s answers, item 
by item, with the average response of his own 
group—himself excluded. These values appear 
in Table 3, where even a casual inspection 
would suggest that Rogerians are most homo- 
geneous, the Analysts least. The correlations 
of Rogerians with their own average have a 
mean of + .79, Analysts with their average 
+ .25, Young Eclectics + .36, and Older Ec- 
lectics + .54. 

Each of the four groups has an average re- 
sponse to each item which, when collated, 
offers a kind of “model” response for that 


Table 4 
Correlations Among Group Average Responses * 


Young 


Group Rogerian Analytic Eclectic 


Analytic 
Young Eclectic 
Older Eclectic 


+.36 
+.30 
—.36 


+.68 


+.20 +.48 


When df = 27,7 5 < .05;7r A7,p < Ol, 


group to the entire questionnaire. It thus be- 
comes possible to compare each therapist with 
the models of the doctrines he does not claim 
to represent. These correlations, which also 
appear in Table 3, further clarify how each 
therapist relates to the several doctrines ex- 
amined. Finally, these models may themselves 
be correlated to produce the figures shown in 
Table 4, in which it appears that the Ana- 
lysts and Young Eclectics are most similar as 
groups in their replies to the items, whereas 
Older Eclectics and Rogerians are least alike. 

Next, a mean score for each group on each 
item was computed, and these sets of means 
subjected to separate F tests. The data ap- 
pear in Table 5, which includes as well the 
within groups variance (df = 30) for readers 
who may wish to test a private hypothesis re- 
garding item differences. Sharp discrepancies 
among the group means are clearly present, 
permitting rejection of the first general hy- 
pothesis—that differences in doctrine and ex- 
perience among psychotherapists are unre- 
lated to their direct reports of behavior with 
patients. 

Finally, as an approach to answering the 
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pist Rog. Ana. OEc. Rog. Ana. YEc. 
Rog. 1 +74 +41 —-25 | wm 2 —21 +23 +03 +48 
2 +78 +33 —30 4 -—65 +08 +04 +54 
3 +82 +30 —42 5 +37 +42 
4 +56 +42 —19 6 —-32 +07 +32 +80 
5 +91 +22 —28 7 —11 +30 +58 +40 
6 +72 +18 —16 8 +06 +18 +25 +22 
7 +85 +26 —48 9 +22 +31 +35 +31 
12 —-2% +03 +24 +59 
YEc. 1 +28 +13 +09 10 —42 +08 +36 +80 
3 —06 +22 +41 13 -0 -17 +43 
11 +02 +462 +58 14 —05 +30 +44 
16 +48 +437 +03 15 —08 +17 +34 +53 
18 -19 +23 +75 17 -27 -20 +16 +46 
19 +48 +36 22 +34 +52 +71 
20 +60 +47 —36 
21 +41 +41 +67 Ana. 1 —-25 +27 +34 +24 
2 +26 +59 +54 +13 
3 +04 425 +38 
4 +70 -12 +04 —41 
5 +11 +18 +54 +35 
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Table 5 
Differences Among Group Means on Questionnaire Items 
Group Rog. Ana. YEc. OEc Group Rog. Ana. YEc. OEc. 
N 7 5 14 oe N 7 5 8 14 
Items Items 
1 2.13 16 4.7 34 34 20 0.67 01 
2 34 20 21 2.8 2.53 a 17 49 34 30 24 1.83 01 
3 3.4 34 2.96 18 40 30 28 2.5 2.33 
4 0.93 05 19 41 24 14 11 0.60 01 
5 44 20 32 2.1 1.60 01 20 46 36 32 28 1.03 01 
6 41 3.2 34 3.1 0.87 _— 21 3.6 2.6 1.8 1.9 1.93 _- 
7 46 42 28 21 1.03 O1 22 46 46 40 33 0.70 01 
8 5.0 3.0 3.1 24 1.50 1 23 43 3.2 3.1 2.1 1.23 01 
9 an” oo 1.5 2.07 —- 24 3.7 2.2 28 1.4 1.63 01 
10 46 42 1.43 25 2.9 2.1 1.67 05 
il 5.0 34 3.0 2.7 0.57 01 26 5.0 5.0 5.0 3.8 1.10 05 
12 3.6 3.2 2.9 1.9 1.47 05 27 5.0 3.6 34 2.1 2.33 01 
13 49 3.0 2.8 1.9 2.00 01 28 4.7 40 44 2.6 1.00 01 
14 4.0 3.0 2.8 2.5 3.67 _ 29 43 3.6 34 34 1.20 _— 
15 49 22 3.5 2.3 2.43 01 30 21 a6—=<CS C8 1.30 


second general hypothesis, all 30 items of the 
questionnaire were intercorrelated, factored 
by the centroid method, and rotated to yield 
the oblique solution shown in Table 6.*? 
The item communalities are reproduced in 
this table as evidence of the efficiency of the 
solution in accounting for the item variances. 

The four factors and those items contribut- 
ing most heavily to them may be regarded as 
empirically homogeneous yet fairly independ- 
ent scales in terms of which group differences, 
if present, might be expected to emerge. To 
examine this possibility, the purest items from 
each factor were grouped into scales and each 
therapist given a score on each scale. Scale I 
was composed of Items 9, 10, 18, and 30; 
Scale II of Items 1, 2, 4, 6, and 14; Scale III 
of Items 7, 11, 22, 26, and 27; and Scale IV 
of Items 5, 15, and 29. The relative independ- 
ence of these measures is shown by their in- 
tercorrelations in Table 7, and their sensitiv- 
ity to group differences appears in Table 8. 

The clearest test of the effects of doctrine 


1The writer is grateful to Chester W. Harris of 
the University of Wisconsin for his counsel on this 
portion of the study. 

2 A two-page table containing the original correla- 
tion matrix has been deposited with the American 
Documentation Institute. Order Document No. 5738 
remitting $1.25 for microfilm or $1.25 for photo- 
copies. 


is thought to be between the Rogerian and 
Young Eclectic groups, and that of the ef- 
fects of experience between the Young and 
Older Eclectics. Table 8 suggests that all four 
scales are sensitive to the variable, Doctrine, 
and that Scales III and IV are influenced by 
the variable, Experience. 
Discussion 

The homogeneity data sustain the impres- 
sion of Rogerian structure as being relatively 
well-defined and consistent from patient to 
patient. Unexpected, however, is the com- 
monality among this group of eclectics who, 
while avoiding doctrinaire alignments, appear 
independently to have defined a doctrine of 
their own. The low agreement among these 
analysts perhaps reflects mainly their small 
number and heterogeneous experience. 

Comparing individual therapists with their 
own and with foreign doctrines serves to em- 
phasize the range of behaviors within schools. 
For example, YEc. 20 and Ana. 4 align with 
the Rogerians: YEc. 18 and YEc. 21 with the 
Older Eclectics. Relatively pure partisans ap- 
pear, such as Rog. 6 and OEc. 4, as well 
as an eclectic’s eclectic (OEc. 9), who has 
roughly equal loadings in each doctrine. 

In choosing the basic groups for further 
analysis, one is tempted to adopt the func- 
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Table 6 
Communalities and Item Loadings for Rotated Factorial Solution 


Item content 


Oblique factors 


No contact after 2 misses 
Goes to patient’s home 

Does not avoid pt. socially 
Gives many dogmatic answers 


Takes case history first 

Would accept small gift 

Answers, ‘‘Are psychiatrists . . .” 
Writes prescriptions 

Diff. roles with diff. pts. 

Sets frequency of interviews 


Interviews relatives 

Eases environment 

Employs hypnosis and drugs 
No fixed interview length 
Answers, “Is therapy helping” 
Grants extra hour 

Answers, “Enjoy music . . .” 
Personalizes office 

Extends productive hour 


Warns of bad marriage 
“Interprets” 

Would talk with spouse 
Talks on phone 

Answers, ‘‘You married?” 
Asks for dreams 


Asks questions 

Diff. roles with same patient 
Gives many midline answers 
Many years doing therapy 


tional clusters suggested by the correlations 
of Table 3, rather than the stated allegiances 
of Table 2. However, such a choice would 
spuriously maximize latent group differences 
and serve to deny the fact of significant varia- 
tions among adherents of a particular school. 

The correlations among the four group 
models shown in Table 4 are thought to re- 
flect some historical developments. The dis- 
tinction between young and older eclectics 
is more than one of years. As these data 
were gathered in 1953, this split corresponds 
roughly to the pre- and post-World War II 
training in psychiatry—a period which saw, 
among other changes, a broader acceptance 
of modern psychodynamic principles. It thus 


seems natural that the Young Eclectics should 
be closer to the Analysts (+ .68) than are the 
Older Eclectics (+ .20). 


Table 7 


Correlations Among Scales Based Upon 
Factor Analysis * 


Scale 


Scale IT 
Scale IIT 
Scale IV 


When df = 32,7 35,9 < .05;r 45,9 < 01. 
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h? I |_| IV 
292 —48 31 17 
314 47 00 00 
20 2 —04 
552 —52 33 —23 
411 —26 55 14 
417 08 57 —15 
785 —21 79 15 
439 15 44 —01 
677 00 74 00 
357 -31 39 37 
7 720 —11 43 16 
ll 655 —03 34 24 
22 590 —08 34 10 
26 391 00 00 00 
27 661 —04 21 34 
25 320 02 14 0s 
16 737 —25 35 42 
24 709 04 —21 45 
23 702 —18 43 40, 
17 490 —02 20 «4 
20 571 37 12 08 
19 720 17 20 47 
13 538 —06 34 41 
12 628 —13 49 16 
8 698 —08 45 22 
5 595 —16 30 61 
15 494 23 —07 50 
29 431 00 00 58 
28 616 —24 13 63 
Scale Scale Scale 
I II 
+.10 
—.00 +.50 
+.21 +.35 +.38 
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Table 8 
Group Means, F Test Probabilities, and Selected Comparisons Among the Factor Scales 


Group Means Differences 
Young Older Rog. vs. Yng. Ec. vs. 
Scale Rogerian Analytic [Eclectic Eclectic p* Yng. Ec. Older Ec. 
I 16.43 12.00 13.00 14.78 <.07 <.050 NS 
II 18.00 13.00 12.62 11.86 <.07 <.050 NS 
lr 24.14 20.80 18.12 14.00 <.01 <.001 <.025 
IV 13.57 7.80 10.12 7.78 <.01 <.025 <.050 


* For between-within analysis of variance, with 3 and 30 df. 


Since any intensive search for group dif- 
ferences is certain to disclose them, includ- 
ing spurious ones, it may be wise to under- 
interpret the individual item variations shown 
in Table 5. The burden of distilling from 
these the broader patterns of variation is 
properly borne by the factor solution. The 
choice of items for the scales derived from 
the factor solution was purely empirical, the 
intent being to use as many items as pos- 
sible without destroying the relative inde- 
pendence of the scale scores. It evolved that 
no scales share over 25% common variance, 
whereas in the original oblique solution, no 
two factors shared over 13% common vari- 
ance. While the internal consistencies of scales 
as brief as these are difficult to establish 
by traditional methods, such as the Kuder- 
Richardson formulae, their factorial origin 
offers direct evidence of their relative homo- 
geneity. 

Granting that the interpretation of meas- 
ures resting upon factored data has always 
to be provisional, the cues for the hypotheses 
which follow appear mainly in Table 6. Fac- 
tor I seems to the writer to represent a di- 
mension involving a distinct, perhaps rigid 
separation of professional and social roles in 
the therapist, most pronounced among the 
Rogerians. One pole of Factor II, with its em- 
phasis upon getting and giving information 
and help, seems associated with the tradi- 
tional, rational, flexible approach to health 
problems, and is least characteristic of the 
Rogerians. These two would seem to be the 
“principled” differences among the groups 
studied. 

Factor III is thought to reflect the broad 


spectrum, resourceful, somewhat supportive 
approach to patients, and Factor IV, with 
its focus upon the activity of the therapist, 
stresses his artful, almost expedient virtuosity 
in dealing with his patients from moment to 
moment. Differences in these last two dimen- 
sions appear to be associated with both the 
therapist’s doctrine and his experience. 

In a series of papers, Fiedler (1, 2, 3) 
demonstrated that the dimension of communi- 
cation in psychotherapy (How well does the 
therapist understand?) is relatively independ- 
ent of the therapist’s doctrine but is sharply 
associated with his expertness, whereas the 
dimensions of status (ranging from deference 
to condescension) and, to a lesser extent, 
emotional distance (ranging from repulsed to 
seductive) reflect variations in doctrine but 
not in expertness. In attempting to relate 
Fiedler’s work to this, it is well to note that, 
although they often pass as synonymous, ex- 
perience is not expertness, and on this latter 
variable the studies may not be directly com- 
pared. The relationships among status, emo- 
tional distance, and therapeutic doctrine may, 
however, be examined, provided one is willing 
to risk some translation of the data. 

In general, Fiedler’s judges found the Ro- 
gerian therapists, irrespective of expertness, 
to be (perhaps tensely) passive and on equal 
status with their patients; Analytic therapists 
were described as aloof. The former finding is 
perhaps mildly confirmed in the present Items 
5, 8, 10, 11, 17, 19, and 20, in all of which 
the Rogerian therapists shun the initiative. 
Evidence of analysts’ aloofness is not promi- 
nent in these data, however, probably since 
this quality relates almost exclusively to the 
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spirit in which behavior is initiated. It would 
have been convenient if one of Fiedler’s ma- 
jor dimensions could be shown to coincide 
with certain of the factors isolated by this 
work; in the writer’s judgment, none con- 
vincingly does. 

Recent work of Strupp (6, 7, 8) offers some 
basis for comparison. Strupp presented sam- 
ple situations to therapists, recorded their re- 
sponses, and then classified these according 
toe the Bales interactional system. Among his 
conclusions is that with experience comes di- 
versification in technique. The present study 
tends to confirm this, in that years of experi- 
ence correlates + 47 (p< .01) with num- 
ber of midline or “3” answers—older thera- 
pists entertain more “possible” courses of 
action. Strupp’s conclusion that experienced 
psychiatrists tend to interpret is not confirmed 
here, in that Items 20 (“Do you interpret 
...?”) and 33 (years doing therapy) cor- 
relate only + .18. Strupp also found that, 
with greater experience, Rogerian psycholo- 
gists tended to be more like analytically ori- 
ented psychologists. While the present study 
lacks experienced Rogerians as a test of this 
finding, it is evident from the correlations in 
Table 4 that experienced eclectics are much 
less analytically oriented than their younger 
colleagues. 

Some general reservations about the data 
remain to be stated. The basic measures 
themselves are primary only in the sense 
that these are ways in which clinicians are 
willing to describe themselves. Next, the 
sample of respondents itself, to the degree it 
fails to represent the universe of psychothera- 
pists, introduces its own bias into the fac- 
torial solution—which solution is, in turn, a 
permissive method of ordering relationships. 
Finally, the questionnaire items represent the 
language and concerns of a clinician whose 
orientation is largely Rogerian. This fact 
may in some ways make it more difficult for 
individuals representing other frameworks to 
portray fairly and fully their particular man- 
ner with patients. 


Summary 


This study examines the extent to which 
differences in doctrine and in experience tend 
to be reflected in the behavior of psychothera- 
pists. The primary data consist of therapists’ 
reports of their handling of issues often aris- 
ing in treatment. The respondents comprise 
virtually every psychotherapist from a par- 
ticular locale, subdivided into these groups: 
Rogerians, Analysts, Young Eclectics, and 
Older Eclectics. Correlations among thera- 
pists indicate greatest homogeneity among 
the Rogerians, least among the Analysts. 
Comparisons of the responses typical for 
each group suggest that the Analysts and 
Young Eclectics resemble each other most, 
while Older Eclectics and Rogerians are least 
alike. Sharp item differences appear among 
the four groups studied, and a factor analysis 
of these items yields four clusters; brief 
scales based upon these clusters produce di- 
mensions which are differentially sensitive to 
the variables, doctrine, and experience. 
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The Problem of Early Termination: 
Is it Really the Terminee? 


Ralph H. Gundlach and Max Geller 
Postgraduate Center for Psychotherapy, New York 


A recent paper by Taulbee (5) reviewed 
the question of the early terminator in psy- 
chotherapy. The approach was to explore the 
character structure of this type of patient, as 
contrasted with that of the patient who con- 
tinues in therapy. Several studies from other 
hospitals and clinics on this topic approach 
the question in similar ways (1; 2, p. 43 ff.; 
3; 6). 

While these studies report from 30% to 
60% of their patients as terminating in 1 to 
5 sessions, the rate of termination of patients 
at the Postgraduate Center, after 1 to 5 ses- 
sions, is about 6%. 

Data from two surveys at the Center are 
pertinent. During 1957, 129 patients entered 
therapy. As of May 1, 1958, 97 of these were 
still in treatment, while 32 had terminated. 
Of the latter, 8 had 5 or fewer sessions, and 
24 had 19 or fewer sessions. An earlier study 
had included half of all the cases terminating 
in 1954. The lowest tally-group of these 130 
cases comprised 1 to 10 sessions, which took 
in 23% of the cases. Of these, 3 had left the 
city, 2 had been hospitalized, and 2 had died. 

What can account for these striking differ- 
ences? The Center patients seem much like 
those of any other outpatient clinic in age, 
education, social status, and problems. Per- 
haps the screening of patients is somewhat 
different. 

It is our opinion that the difference is 
largely due to the orientation and objectives 
of the therapists at the various institutions. 
We perhaps seek more personality reconstruc- 
tion than do the therapists at many other in- 
stitutions (4). Borderline schizophrenics and 
difficult character disorder patients can get 
involved in therapy for many months. 


Evidence that the Postgraduate Center has 
a different orientation comes also from data 
on duration of therapy. In three of the studies 
cited, it is possible to estimate the median 
number of sessions: 8, 17, and 26. In con- 
trast, the median number of sessions for pa- 
tients at the Center terminating in 1954 was 
about 44. The survey of May 1958 showed 
129 who entered therapy during 1957, and 
262 others in treatment who had started prior 
to 1957. Thus, the patient of median duration 
was in treatment well over one year. 

The possibility arises, then, that some of 
the studies purporting to measure differences 
in the personality of the “attriters” and “re- 
mainers” are also measuring, indirectly, the 
kind of personality problems that the staff is 
interested in, or skilled at, handling. 


Received August 1, 1958. 
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Remainer Patient Attributes and Their Relation to 


Subsequent Improvement in Psychotherapy 


Martin M. Katz," Maurice Lorr, and Eli A. Rubinstein 


Reports from mental hygiene clinics indi- 
cate that of all patients accepted for psycho- 
therapy, some 30 to 65% will terminate treat- 
ment prematurely against the advice of the 
therapist. Of those patients who remain, only 
two-thirds will be subsequently rated im- 
proved as a result of treatment (2). These 
findings have raised a number of questions 
relative to our understanding of the kind of 
patient now deriving benefit from psychother- 
apy. With respect to the problem of predict- 
ing length of stay in treatment, Frank et al. 
(3) have reviewed the significant progress 
made in this area. In more recent intensive 
investigation of the problem (5), the general 
finding that premature terminators and re- 
mainers can be distinguished with respect to 
certain patient attributes, was confirmed. The 
knowledge, however, that patients who re- 
main do not necessarily improve in treatment 
led to the question of whether the attributes 
found to discriminate between terminators 
and remainers were predictive in some degree 
of subsequent improvement also. The present 
study was intended as a preliminary investi- 
gation of this problem. In addition, certain 
therapist variables, such as years of experi- 
ence, not found to relate significantly to 
length of stay but presumed to be relevant 
to the improvement criterion, were included 
in the study. 


Method 


As part of a larger study of premature 
termination of treatment, a nationwide group 


1 Now with Psychopharmacology Service, National 
Institute of Mer*al Health, Silver Spring, Md. 

2 From the Veterans Administration, Veterans Bene- 
fits Office, Washington, D. C. 


Veterans Administration 2 
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of 13 Veterans Administration Mental Hy- 
giene Clinics, using a brief self-administered 
battery of five tests, collected data on all 
cases considered for psychotherapy (4). Pa- 
tients examined for hospitalization, neurologi- 
cal disorders, or for brief (half hour or less) 
irregular treatment were excluded. The intake 
social worker completed a form at the close 
of the interview descriptive of such patient 
characteristics as occupation, annual earnings, 
highest grade completed, and presenting prob- 
lems. In addition, the worker made a predic- 
tion as to how long the patient would remain 
in treatment. At the end of the six-month pe- 
riod, if a patient still remained in treatment, 
the therapist rated the extent to which the 
patient had changed or improved as a result 
of psychotherapy. 

The 232 cases included in the study were 
randomly split into two samples of 116 cases. 
Each subsample consisted of 58 cases who 
terminated treatment with six weeks of treat- 
ment or less and 58 cases with 26 weeks of 
treatment or more. Analysis of data in the 
two samples by means of a double cross- 
validation procedure yielded three short tests 
which successfully differentiated patients who 
remain from those who terminate prema- 
turely. One subscale was derived from items 
included in a Behavior Disturbance Scale, 
which is designed to elicit information con- 
cerning the extent of antisocial behavior in 
the patient’s past, lack of impulse control, 
lack of personal ties, and lack of goal per- 
sistance. Another subscale was selected from 
the Taylor Manifest Anxiety Scale (6). The 
third subscale consisted of items selected from 
the Adorno, Levinson, et al. (1) F scale of 
authoritarianism. Scores on these three tests 
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and the social worker’s prediction of length 
of stay combined configurally correlated .54 
with the terminator-remainer criterion in the 
cross-validation sample. In addition, annual 
earnings and the occupational level of the pa- 
tient were found to discriminate significantly 
between the terminators and remainers. 

For the present analysis, three variables de- 
scriptive of 116 patients remaining in treat- 
ment for six months or longer were correlated 
with the therapist rating of improvement. 
These variables were: (a) the configural score 
based on a combination of the three sub- 
scales; (5) the occupational level of the pa- 
tient; and (c) the patient’s annual earnings 
for the preceding year. 

Three therapist variables were also corre- 
lated with the rating of improvement, al- 
though none had been found to relate signifi- 
cantly to length of stay in treatment. These 
were included in order to explore in a pre- 
liminary fashion the relation of therapists’ 
characteristics to their judgment of improve- 
ment. These variables were: (a) the thera- 
pist’s years of experience; (5) the therapist’s 
diagnostic classification of the patient; and 
(c) the presence or absence of personal analy- 
sis in the experience of the therapist. For the 
diagnosis, each patient was classified as psy- 
choneurotic, psychosomatic, a character dis- 
order, or as psychotic. In the analyses, the 
first two and the last two were combined. 
Product-moment and point-biserial correla- 
tions were used. 


Results and Discussion 


Although all of the above described test 
and actuarial patient variables are signifi- 
cantly predictive of length of stay in treat- 
ment, not one correlates significantly with the 
measure of improvement employed in this 
study. Because these results are generally 
negative, any conclusions that can be drawn 
from the data must be highly tentative in 
nature. The findings suggest, however, that 
knowledge of those factors that determine 
whether a patient will remain in treatment 
may have little to tell us about whether he 
will subsequently improve. The failure of the 
socioeconomic variables to predict improve- 
ment confirms the finding by Frank et al. 
that patients of lower socioeconomic status, 
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Table 1 
Correlations of Selected Therapist Variables with 
Therapist Improvement Rating Criterion 
in Samples A and B 


Correlation 


Sample Sample 
A B 


Variable 


49* 
34* 


Therapist’s years of experience 
Patient diagnostic classification 


.39* 


* Significant at .01 level. 


although less likely to remain in treatment, 
will, if they continue, demonstrate no less 
improvement than patients from middle eco- 
nomic groups. Consistent with this finding, 
such configural test patterns as “low authori- 
tarian, low antisocial, high anxiety,” found to 
be highly predictive of continuation in treat- 
ment, did not demonstrate any relation to 
extent of improvement. 

Conclusions that can be drawn from these 
results are somewhat limited also by the na- 
ture of the criterion. The therapist’s rating, 
although of importance, can be considered as 
only one aspect of a complete objective as- 
sessment of patient change. An adequate 
evaluation of patient change would include 
self appraisals by the patient, objective test 
data, judgments of independent observers and 
follow-up data. Consequently, it is possible 
that the predictors of patient length of stay 
are related to some other aspects of what is 
surely a multidimensional criterion. 

The results of the analysis of the therapist 
variables are given in Table 1. The therapist’s 
rating was found to be highly related to his 
years of experience as a psychotherapist. 
Lacking other indices of improvement and 
therapeutic competence, we can only wonder 
whether increased years of experience actu- 
ally result in greater success in therapy or 
whether the standards and perceptions deter- 
mining improvement ratings of psychothera- 
pists change with increasing years in the field. 
We do know that the therapist’s diagnostic 
classification of the patient at the end of six 
months as either (a) psychoneurotic or psy- 
chosomatic or (4) character disorder or psy- 
chotic, did not correlate with years of ex- 
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perience, but was highly related to the im- 
provement ratings. This finding suggests that 
patients who are diagnosed as less severely ill 
are more likely to be rated improved, regard- 
less of the therapist’s experience. Finally, the 
improvement ratings appeared to have little 
or no relation to whether the therapist had 
or had not undergone personal analysis. Ob- 
viously more has to be known about the basis 
of therapist ratings and the relation of such 
ratings to other more objective indices of im- 
provement. 


Summary 


The aim of this study was to investigate 
the relation between patient attributes found 
to be significantly predictive of length of 
stay in psychotherapy and subsequent im- 
provement in treatment. A test variable, the 
configural score, and two actuarial variables 
(occupational level and annual earnings) 
were correlated with therapists’ ratings of im- 
provement following a six-month period of 
treatment. No one of the above variables was 
found to correlate significantly with the cri- 
terion. The results suggest that patient at- 
tributes associated with subsequent improve- 
ment may be very different from those that 
determine whether a patient will continue in 
treatment. The nature of the criterion, how- 
ever, imposes certain limitations on conclu- 
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sions that can be drawn from the data. It was 
found, for example, that therapist improve- 
ment ratings were significantly related to his 
years of experience as therapist and to his 
diagnostic classification of the patient fol- 
lowing six months of treatment. The need for 
investigating additional hypotheses concern- 
ing characteristics of potential stayers in psy- 
chotherapy in relation to other independent 
criteria is emphasized. 
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A Note on McNemar’s “On Abbreviated | 
Wechsler-Bellevue Scales” 


William Howard 
Central Louisiana State Hospital, Pineville, Louisiana 


McNemar’s article (2), in which he com- 
puted the correlations between short forms 
and the full scale Wechsler-Bellevue for 
Wechsler’s standardization group, contains an 
error. The culprit is hiding in the first radical 
in the denominator of the special formula, and 
reduces all of the reported correlations slightly 
below their true values. 

Although studies have appeared on short- 
form—full-scale correlations for the WAIS (1, 
3, 4), the test which has officially replaced 
the WB, McNemar’s study on the WB re- 
mains important since the WB is still being 
used in some clinical settings. Furthermore, 
investigators compare their WAIS results with 
his WB findings. 

Maxwell (3) repeated McNemar’s pro- 
cedure on the WAIS standardization group. 
In general, she found WAIS short forms pro- 
duce higher correlations than do the WB 
short forms. Although the best WAIS com- 
binations do not contain the same subtests as 
the best WB forms, even McNemar’s reported 
WB correlations are lower than the correla- 
tions for the same WAIS combinations. Max- 
well refers to this in her statement: “With 
the exception of one duad, all WAIS com- 
binations yield larger correlations than cor- 
responding WB scales.” When the WB corre- 
lations are corrected however, this last 
observation is not valid. 

The first radical in the denominator for Mc- 
Nemar’s general formula is \/ + 23rjjo,0;. 
In arriving at his special formula he assumes 
that all o’s are equal to 3. This allows factor- 
ing out the o*’s so that the radical becomes 
o\/n + 23ry. The term 37, the sum of sub- 
test intercorrelations given by Wechsler in 


Table 41 (5) is equal to 20.01, and m is the 
number of subtests. In the special formula for 
WB, the value of the radical is \/ 50.02, or 
7.0725, rather than 7.26 as reported. (The o 
cancels out with the same factor in the nu- 
merator.) Therefore, all of McNemar’s cor- 
relations must be increased by a factor of 
7.26/7.0725, or 1.0265. 

This correction so increases the WB cor- 
relations that they exceed the corresponding 
WAIS correlations for 24 of the 40 short 
forms reported by McNemar. The WB cor- 
relations are equal to the WAIS results in 
three of the remaining instances. 

The correction does not alter the basic re- 
sult of Maxwell’s study. In spite of the in- 
creases, the best WB short forms are slightly 
inferior to the best WAIS combinations. In 
all cases, but especially for the tetrads and 
pentads, the differences are very small after 
the WB correlations are corrected. 
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Conditioning of Hostile Verbalizations in a Situation 
Resembling a Clinical Interview’ 


Arnold H. Buss 


University of Pittsburgh 
and Ann Durkee 


Purdue University 


A recurrent obstacle in interviewing and 
psychotherapy is the verbal inhibition of cli- 
ents. Topics that are associated with anxiety 
are avoided or discussed only with the great- 
est reluctance. Since little progress can be ex- 
pected unless problems are brought up and 
discussed, verbal inhibition is clearly an im- 
portant area for clinical research. 

The research may take two forms. First, 
verbatim interviews could be examined to de- 
termine the psychologist’s techniques of facili- 
tating verbalization, and then the success of 
the techniques might be measured. The second 
form involves setting up a laboratory analogue 
of the interview situation and investigating 
the effect of the experimenter’s (E’s) verbal 
behavior on the subject’s (S’s) verbalizations. 
The present study is of the second kind. 

It was felt that while the situation should 
resemble a clinical interview as closely as pos- 
sible, discrete trials should be employed for 
better control of reinforcement by the E. 
Therefore the task of making up sentences 
was used. This task was first used by Taffel 
(2), who had a series of 3 x 5 cards, each 
containing a verb and six pronouns. The S 
was instructed to make up a sentence using 
the verb and one of the six pronouns. After 
an initial period of free responding, the S was 
reinforced for every sentence containing the 
pronoun “I” or “We.” Taffel compared this 
procedure to a therapist’s reinforcing a cli- 
ent’s talking about himself. 


1The writers are indebted to Herbert Gerjuoy, 
George Wischner, and Melvin Manis for their help- 
ful suggestions. 


Self-reference content is not usually asso- 
ciated with anxiety, whereas anxiety often 
accompanies verbalization of hostile content. 
Often clients are able to talk about their hos- 
tile impulses only after their strong inhibi- 
tions are removed or weakened. Such inhibi- 
tion may also be expected in a nonclinical 
population because it is part of general cul- 
tural training to suppress and punish hos- 
tility. 

The present study seeks to investigate the 
verbal conditioning of hostile and neutral ma- 
terial. Since inhibition is expected to retard 
verbalization of hostile content, the learning 
of hostile material should be slower than the 
learning of neutral material. Also, since men 
are allowed a greater variety and intensity of 
hostile response in our culture, men should 
learn hostile material faster than women. 


Method 
Stimuli 


Two considerations dictated the initial 
choice of words. First, the grammatical form 
should be constant, and therefore only verbs 
in the past tense were used. Second, slang and 
pedantry should be avoided, and only rela- 
tively familiar verbs were selected. 

In compiling a list of hostile words a wide range 
of intensity was noted, eg., stabbed, tortured vs. 
criticized, resented. It was decided to have two de- 
grees of hostility, mild and intense. A compilation 
of hostile and neutral words was submitted to six 
judges,? who sorted them into neutral, mildly hos- 


2 The writers wish to thank Idel Bruckman, Mary 
Oltean, Jerry Wiggins, and Marvin Zuckerman for 
serving as judges. The writers served as the fifth and 
sixth judges. 


415 


416 


tile, and intensely hostile groups. The criterion of 
acceptance of a word was agreement by five of the 
six judges. 

Next, the Thorndike-Lorge word count (3) was 
used to match the frequencies of the three types of 
verbs. Finally, the matched words were placed on 
3 x 5 white index cards. There were 30 such cards, 
each with a neutral, a mildly hostile, and an in- 
tensely hostile verb. Since a pilot study revealed that 
50 or 60 trials would be required for learning to 
occur, the number of cards had to be doubled. This 
was accomplished by using each of the words twice. 
The second time a word was used, it was accom- 
panied on the card by different words than had ac- 
companied it the first time. For example, tortured 
appeared on a card with argued and invented the 
first time and with flattered and rebelled the second 
time. The order of placement of the verbs on the 
cards and the order of presentation of cards were 
randomized for the series of 60 trials. 


Experimental Design 

There were two groups of Ss and 60 trials. 
During Trials 1-10, the Ss made up sentences 
but received no reinforcement. This free re- 
sponding period established the frequency of 
response for each type of word prior to con- 
ditioning. During Trials 11-60, one group of 
Ss was reinforced for using neutral verbs, and 
the other group was reinforced for using in- 
tensely hostile words. There was no group 
that was reinforced for using mildly hostile 
words; in this exploratory research the writers 
felt that any differences in learning should be 
maximized by using the extreme of hostility. 
Therefore, of the hostile verbs, only the in- 
tensely hostile ones were reinforced, and here- 
after the term hostile verbs will refer only to 
intensely hostile verbs. 


Procedure 


Each S$ was instructed to make up a sen- 
tence using one of the three verbs on each 
card. During Trials 1-10 the E said nothing. 
During Trials 11-60 the E said Right after 
the correct verb was used and Wrong after an 
incorrect verb was used. 


Subjects 


The Ss were 80 college students, 40 men 
and 40 women. Within each sex, Ss were as- 


8 Words were grouped into three categories: 0-24, 
25-50, and above 50 times per million according to 
the Thorndike-Lorge count. Thus the three kinds of 
verbs were matched on the basis of the frequency 
range in which they fell. 
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Fig. 1. Acquisition curves of men and women rein- 
forced for hostile and neutral words. 


signed randomly to either the neutral rein- 
forced group or the hostile reinforced group, 
and each group had 20 men and 20 women. 


Results 

The 60-trial series was divided into six 
blocks of ten trials each. The measure of 
learning is the frequency of occurrence of the 
response class that was reinforced. For the 
Neutral group the frequency of neutral words 
per block of ten trials is the measure, and for 
the Hostile group the frequency of hostile 
words per block of ten trials is the measure. 
The learning curves for the two groups and 
for both sexes are plotted in Fig. 1. Inspec- 
tion of these curves reveals clear-cut differ- 
ences in the height of the curves. Also the 
rate of learning hostile words (the shape of 
the curves) appears to be slightly faster than 
the rate of learning neutral words. 

The significance of these trends was tested 
by a Lindquist Type III analysis of variance 
(1). This analysis of variance is summarized 
in Table 1. In this table the main effects for 
Words, the main effects for Sex, and the 
Words X Sex interaction reflect differences in 
the height of the curves. Differences in the 
shape of the curves (rate of learning) are re- 
flected in the Words xX Trials and Sex x 
Trials interactions. 

The significant F ratios for Words and for 
Words X Sex indicate that there are signifi- 
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Table 1 


Analysis of Variance of Hostile and Neutral Responses 
of Men and Women for Blocks of 10 Trials 


Error (between) 


Trials 

Trials X Words 

Trials X Sex 

Trials X Sex X Words 
Error (within) 


Total 


<.05 
< 01 


cant differences in the height of the four 
curves. Since there was a significant Words x 
Sex interaction, four comparisons were made: 
men’s neutral vs. men’s hostile; women’s neu- 
tral vs. women’s hostile; men’s neutral vs. 
women’s neutral; and men’s hostile vs. wom- 
en’s hostile. Two of the four comparisons 
yielded significant differences. The women’s 
hostile curve was (a) significantly lower than 
the women’s neutral curve (¢+ 3.19, p< 
.001 for 76 df) and (8) significantly lower 
than the men’s hostile curve (¢ = 2.63, p< 
.02 for 380 df). 

The significant F ratio for trials indicates 
that conditioning did occur (significant in- 
creases in frequency of response on succes- 
sive blocks of trials). The significant Words 
x Trials interaction suggests that the shape 
of the neutral curves was significantly differ- 
ent from the shape of the hostile curves. Evi- 
dently the rate of learning hostile words is 
significantly faster than the rate of learning 
neutral words. 

The first 10 trials constituted the free re- 
sponding period, during which the E said 
nothing after all of the Ss’ responses. Since 
all Ss were treated alike during this period, 
the data from each group may be pooled. 

On each trial the S could respond with a 
neutral, a mildly hostile, or an intensely hos- 
tile word. In order to simplify the chi square 
test of difference in frequency of response, the 
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neutral and mildly hostile words were pooled. 
It was now possible to test the significance of 
sex differences in frequency of response of in- 
tensely hostile vs. neutral and mildly hostile 
words during the first ten trials. 

For the 10-trial period there were 11 pos- 
sible combinations of frequencies: 10 neutral- 
mildly hostile, 0 intensely hostile; 9 neutral- 
mildly hostile, 1 intensely hostile; ...0 neu- 
tral-mildly hostile, 10 intensely hostile. These 
combinations for men and women constituted 
a contingency table which yielded a chi square 
of 21.2, which is significant at the .01 level 
of confidence. Thus the women gave signifi- 
cantly fewer intensely hostile and significantly 
more neutral-mildly hostile responses than the 
men during the free responding period. 


Discussion 


It was predicted that neutral material would 
be learned at a faster rate than hostile ma- 
terial. The results proved otherwise, with hos- 
tile words being conditioned faster than neu- 
tral words. A variable that was not discussed 
previously may have contributed to these find- 
ings: size of response class. The hostile verbs 
were sampled from a population of a few hun- 
dred hostile verbs, but the population of neu- 
tral verbs is many thousands. Learning of a 
large class of responses is expected to be 
slower than learning of a small class. Hence 
neutral verbs were conditioned at a slower 
rate than hostile verbs, despite the inhibition 
associated with the latter. 

It was also predicted that women would 
manifest slower learning of hostile verbaliza- 
tions than men. This prediction was not borne 
out. Rather, the greater inhibition present in 
women was revealed in the absolute number 
of hostile verbs used. Both in free responding 
and acquisition, women produced significantly 
fewer intensely hostile verbs than men. The 
intensely hostile verbs referred mainly to 
physical violence, e.g., lynched, mauled, 
smashed, stabbed, strangled, and tortured. 
The actions represented by these words are 
severely punished in our culture; they are 
occasionally allowed by men but never by 
women. This difference in cultural training 
would seem to account for the sex difference 
in frequency of hostile response. 


Mean 
Source df square F 
Words 1 93.64 5.05* 
Sex 1 30.00 1.62 
Words X Sex 1 200.21 10.79** 
76 «18.55 
5 49.45 15.46** 
5 11.47 3.58°* 
5 66 21 
5 4.70 1.47 
330 3.20 
479 
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However, the inhibition that depressed fre- 
quency of response did not prevent the 
women from conditioning. While their fre- 
quency of hostile response was initially low, 
it increased with successive acquisition trials. 
If this finding is confirmed with clinical popu- 
lations, it would prove encouraging, for it 
suggests that initial inhibition of response can 
be overcome by reinforcement. In the investi- 
gation of this and other clinical issues the 
present analogue to a clinical interview would 
seem to be promising. 


Summary 

The task of making up sentences was con- 
ceptualized as a laboratory analogue of a 
clinical interview. There were two groups of 
college students, each group consisting of 20 
men and 20 women. One group was reinforced 
for intensely hostile verbalizations and the 
other group for neutral verbalizations. 

It was found that intensely hostile verbali- 
zations were conditioned faster than neutral 
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verbalizations. This was attributed to the 
larger size and the diffuseness of the neutral 
class in comparison to the intensely hostile 
response class. Women produced significantly 
fewer intensely hostile responses than men, 
which was held to be consistent with sex dif- 
ferences in cultural training. Initial inhibition 
of response did not prevent conditioning from 
occurring in women. It was suggested that 
both the findings and the technique have 
clinical implications. 

Received November 25, 1957. 
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Factor Analysis of Interview Interaction Behavior' 
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By utilizing an interview standardized along 
certain minimal dimensions, it has been pos- 
sible to investigate certain characteristics of 
the overt interaction patterns of different 
groups of subjects (8, 9, 10, 11, 12, 13, 14, 
15, 16, 17). Results to date suggest that in- 
terviewee interaction patterns have the fol- 
lowing characteristics. First, there are wide 
individual differences in interaction patterns 
among subjects. Second, the interviewee in- 
teraction characteristics for any given subject 
are highly stable across two different inter- 
viewers when the latter standardize their in- 
terviewing behavior along the minimal prede- 
fined dimensions of our standardized method, 
and, at the same time, these interviewee char- 
acteristics are modifiable by planned changes 
in the intra-interview behavior of either inter- 
viewer (10, 15, 17). Third, the marked sta- 
bility and modifiability of an individual’s in- 
teraction patterns, found for our first sample 
of subjects (15), were cross-validated in a 
second sample (9). And fourth, the stability 
and modifiability were equally striking when 
only a single interviewer was used and the 
test-retest interval was extended to seven 
days (16), five weeks (17), and eight months 
(17), in contrast to the first two studies 
which employed a test-retest interval of a 
few minutes. 

The finding that interviewee interaction 
patterns are stable under identical (along 
certain dimensions) interpersonal environmen- 
tal stimulus conditions and modifiable when 


1 This investigation was supported by a research 
grant (M-1938) from the National Institute of Men- 
tal Health, of the National Institutes of Health, U. S. 
Public Health Service. 


these environmental conditions change can be 
thought of as constituting one type of evi- 
dence of the construct validity of the inter- 
action patterns (1, pp. 13-28; 5, p. 288). 
However, most of our research to date has 
concerned itself with establishing the reliabil- 
ity of these interview interaction variables. 
Thus we have studied and established as hav- 
ing extremely high reliability the following 
aspects of interview interaction behavior: the 
reliability of the interviewer who serves as 
the independent variable by conducting the 
standardized interview (9, 15, 17); the reli- 
ability of the interviewee interaction patterns, 
the dependent variables (9, 15, 16, 17); the 
reliability of the observer who observes the 
interviewer-interviewee interaction and rec- 
ords his observations by pressing separate 
keys for each participant (14); and, finally, 
the reliability of the scorer who scores the 
final Interaction Chronograph record (15, p. 
429). 

Having thus established unusually high re- 
liabilities for these interview interaction vari- 
ables, it was appropriate to turn to the ques- 
tion of their validity, or “meaning.” It was 
apparent to us that the question of the va- 
lidity of our measures, as of most personality 
rneasures derived from other frameworks, was 
made all the more difficult to investigate by 
the fact that there exists no well-defined or 
highly integrated interaction theory of per- 
sonality from which hypotheses relevant to 
validity could be generated. For this reason, 
we have approached the validity problem 
from a number of points of view; amounting 
largely to what Cronbach and Meehl call the 
“bootstraps” method (5, pp. 286-287). When 
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sufficient studies bearing on validity have been 
conducted and reported, it is hoped that a 
preliminary theory of interaction can be for- 
mulated and new, specific hypotheses gener- 
ated and tested. To date we have reported 
one validity study (12) and soon will be re- 
porting two additional ones (11, 13). 

The work here being reported, a factor- 
analysis of the 12 interview interaction vari- 
ables used by us before the recent addition 
of two additional variables, is an example of 
what Cronbach and Meehl call construct va- 
lidity (5, p. 287). The three other validity 
studies are more properly examples of con- 
current validity (1). The first, already re- 
ported, deals with psychological test corre- 
lates of these interview interaction behaviors 
(12). The second, on the relationship be- 
tween some aspects of the verbal content of 
the interview and these formal overt inter- 
view interaction patterns (13), and the third 
on differences in interview interaction patterns 
among five different psychodiagnostic groups 
(11), are now in preparation. 


Procedure 
The data which were factor-analyzed for 


the present study consisted of the standard- 
ized interview interaction scores of 60 sub- 
jects (10). The Ss were unselected and were 
interviewed in their order of referral to the 
psychiatric outpatient clinic of a large, urban 
medical center. There were 32 females and 28 
males, all white, ranging in age from 16 to 62 
(median age of 33), who had presenting 
complaints typical of outpatients at most psy- 
chiatric facilities. The standardized interview 
was conducted by two experienced interview- 
ers. For purposes of establishing reliability of 
the measures, each S was interviewed twice. 
For 40 of the Ss, the two interviews were con- 
ducted during the same afternoon, usually 
with a 2- to 5-minute interval between inter- 
views, and in counterbalanced order by the 
two interviewers. All of the test-retest inter- 
views with the remaining 20 Ss of the present 
study were seven days apart and were con- 
ducted, on both occasions, by only one of 
these two interviewers (the psychiatrist). 
The Interaction Chronograph, devised by Chapple 
(2, 3, 4) for recording certain temporal aspects of 
verbal and gestural interaction, was the instrument 
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of measurement used during the 60 interviews. The 
rales of the standardized interview (8) require the 
interviewer to follow certain simple predefined be- 
havioral patterns (for example, that each of the in- 
terviewer’s comments should be 5 seconds in dura- 
tion). In addition, the rules require the interviewer 
to vary his behavior somewhat during each of five 
predefined subperiods of the interview. The inter- 
viewer’s varied behavior during these five periods 
constitutes a sequence of different behavioral situa- 
tions, or what can be thought of as varying minia- 
ture interpersonal situations, designed to elicit the 
interviewee’s characteristic interaction patterns un- 
der different interpersonal conditions. Periods 1, 3, 
and 5 consist of free give-and-take interviewing, 
while Periods 2 (silence) and 4 (interruption) are 
stress phases of the interview (8). It should be noted 
that the stress of Periods 2 and 4 is provided by the 
overt behavior (“not responding” and “interrupting,” 
respectively) of the interviewer, and not by the 
“content,” or what he says to the interviewee. The 
durations of Periods 1, 3, and 5 are fixed (10, 5, 
and 5 minutes, respectively), whereas the durations 
of Periods 2 and 4 are variable. The latter periods 
consist, respectively, of 12 failures to respond or 15 
minutes, whichever is shorter, and 12 interruptions 
by the interviewer or 15 minutes, whichever is 
shorter. The length of the standardized interview 
ranged for these 60 Ss from 24.5 minutes with one 
patient to the full fifty (50.3) minutes for another. 
The mean length of the interview was 32 minutes 
with this outpatient sample. 

More complete definitions of the interaction vari- 
ables here factor analyzed can be found in our 
earlier reports (8, 15). Briefly, they are: (a) Pt.’s 
Units, the number of times the patient acted; (5) 
Pt’s Action, the average duration of the patient’s 
actions; (c) Pt.’s Silence, the average duration of 
the patient’s silences; (d) Pt.’s Tempo, the average 
duration of each action plus its following inaction, 
as a single measure; (e) Pt.’s Activity, the average 
duration of each action minus its following inaction, 
as a single measure; (f) Pt.’s Adjustment, the dura- 
tions of the patient’s interruptions minus the dura- 
tions of his failures to respond, divided by Pt.’s 
Units; (g) Dr.’s Adjustment, the durations of the 
interviewer’s interruptions minus the durations of 
his failures to respond, divided by Pt.’s Units; (h) 
Pt.’s Synchronization, the number of times the pa- 
tient interrupted or failed to respond to the doctor, 
divided by the number of Pt.’s Units; (i) Dr.’s 
Units, the number of times the doctor acted; (j) 
Pt.’s Initiative, the percentage of times, out of the 
available number of opportunities (usually 12) in 
Period 2, in which the patient acted again (within 
a 15-second limit) following his own last action; 
(k) Pt’s Dominance, the number of times in Pe- 
riod 4 that the patient “talked down” the doctor 
minus the number of times the doctor “talked down” 
the patient, divided by the number of Pt.’s Units in 
that period; (1) Pt.’s Quickness, the average length 
of time in Period 2 that the patient waited before 
taking the initiative following his own last action. 
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Table 1 


Intercorrelations (Pearson r’s) Among Interaction Variables Based on Interviews in a Sample of 60 Subjects; 
First Interviews Above the Diagonal, Second Interviews Below the Diagonal 


Variables 


. Pt.’s Units 

Pt.’s Action 

. Pt.’s Silence 

. Pt.’s Tempo 

. Pt.’s Activity 

. Pt.’s Adjustment 

. Dr.’s Adjustment 

. Pt.’s Synchronization 

. Dr.’s Units 

. Pt.’s Initiative (Per. 2) 
. Pt.’s Dominance (Per. 4) 
. Pt.’s Quickness (Per. 2) 


Note.—Forr = .25, p < .05. 
Forr = .32, p < .01. 


Results 


The retest standardized interview on each 
of the 60 Ss made it possible to replicate and 
thus check the reproducibility (through an 
attempt at cross-validation) of the results of 
the factor analysis of the 60 first interviews. 
In Table 1 is shown the matrix of intercor- 
relations (Pearson r’s)* of the 12 interview 
interaction variables derived from the 60 in- 
terviews. The intercorrelations for both the 
first and second (replication) interviews are 
included in the one table in order to facili- 
tate comparison. Intercorrelations for the 
data of the first interview are given above 
the diagonal in Table 1, while the intercor- 
relations for the second interview are shown 
below the diagonal. As can be seen, the in- 
tercorrelation values for the replication inter- 
view are strikingly similar to those shown for 
the original interview and constitute a clear 
cross-validation of the findings with the 60 
first interviews. 

Before going on to the factor analysis of 
these data, several points in Table 1 are 
worth noting. The first of these is the high 


2The frequency distribution of the interaction 
scores of 134 Ss, representing a normal population 
as well as several diagnostic groups, sufficiently ap- 
proximated a bell shape to meet the required sta- 
tistical assumption of a normal distribution. Pt.’s 
Initiative deviated somewhat, with some tailing-off 
at the low end (0.00) and piling-up at the high end 
(1.00). 


negative correlation (—.73) between how 
often a patient speaks (Pt.’s Units) in the 
standardized interview and the average dura- 
tion of each of these units of utterance (Pt.’s 
Action). It would appear that people who 
speak often in the standardized interview do 
so in relatively brief utterances, whereas peo- 
ple who speak infrequently tend to do so 
in correspondingly longer utterances. On the 
other hand, since a correlation between two 
events merely indicates an existing relation- 
ship but tells us nothing about which event 
influences the other, one can also interpret 
this — .73 relationship to mean that people 
who habitually speak in brief utterances do 
so relatively frequently; whereas individuals 
whose utterances tend on the average to be 
quite lengthy, seem to speak much less often. 
This finding would indeed be an interesting 
addition to our knowledge of human behavior 
were it not for the possibility that it may 
represent an artifact of the standardized in- 
terview. It will be remembered that three pe- 
riods {1, 3, and 5], constituting 20 minutes 
of the standardized interview, are of fixed 
duration, whereas, on the average, only 12 
minutes (but potentially 30 minutes for all 
Ss) are of variable length.* Thus, in a par- 


8 For the 60 Ss of this study, the range for the 
combined length of the two variable periods (Pe- 
riods 2 and 4) was from 4 minutes to 30 minutes, 
with a mean of 12 minutes. 


= 
Pe 1 2 3 4 5 6 7 8 9 0 1 12 
9% 19° 22 —52 -79 17 OS 35 
9 48 -45 -25 SO 4 61 
4 -% @ -2 -@ -% 70 —74 —21 —64 
11 oa 2 a 43 
1 46 © -% -3 2 
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tially “closed system,” whose limits on dura- 
tions (and thus indirectly on units) are pre- 
scribed in advance, it is not surprising that 
a negative correlation should exist between 
number of units and average duration of ac- 
tion. It is clear that an S who acts 112 times 
during a 32-minute interview must do so in 
briefer durations than a second S who inter- 
acts only 25 times during the same period. 
The high negative correlation is probably not 
all artifact, however, since up to 30 minutes 
of the interview are variable and unfixed in 
regard to duration, with the interviewer ter- 
minating each of Periods 2 and 4 after any- 
where between 1 and 15 minutes, depending 
on the patient’s “unrestricted,” in this sense, 
behavior in these periods. However, it is pos- 
sible better to test the degree of correlation 
between number of units and average duration 
of action by observing a series of pairs of two 
individuals in a free conversation, or a single 
pair of individuals in a series of free conversa- 
tions (free of predetermined time limits). We 
plan to carry out such studies as a more ade- 
quate test of this initial finding. 

A second finding of interest in Table 1 is 
the relative independence of Pt.’s Units and 
the following variables: Pt.’s Silence; Pt.’s 
Adjustment; Dr.’s Adjustment; Pt.’s Initia- 
tive; Pt.’s Dominance; and Pt.’s Quickness. 
In part, this is in accordance with Chapple’s 
original plan since, as with a technique like 
the Rorschach, in order to make comparable 
the results from Ss with different numbers of 
interaction units, all Interaction Chronograph 
variables are divided by the number of units 
for that individual (8, p. 354). It is clear, 
however, that even though the influence upon 
all variables of different numbers of units 
among different Ss has been thus partly con- 
trolled, there is still a highly significant nega- 
tive correlation between number of units and 
one of these variables, average duration of 
action (described above). That is, after re- 
lating all variables to the same scale, Pt.’s 
Units still correlate with Pt.’s Action, whereas 
Pt.’s Units does not correlate materially with 
Pt.’s Silence, Pt.’s Adjustment, Pt.’s Initia- 
tive, Pt.’s Dominance, and Pt.’s Quickness. 
Thus, in descriptive language it would ap- 
pear that how long a person is silent (in sur- 
prising contrast to how long he acts); how 
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he maladjusts, when he does so in two-person 
interactions; how often he takes the initia- 
tive; and how quick he is to do so; and how 
often he dominates or is dominated are all 
behavioral characteristics which are independ- 
ent of his total number of units (i.e., his fre- 
quency of interaction). 

Other findings of interest seen in the first 
interview data in Table 1 are: the moderate 
negative correlation between action and si- 
lence (—.30); the moderate positive correla- 
tion between duration of action and frequency 
of initiative-taking (.25); the independence 
of duration of action and frequency of domi- 
nance (.12); the high negative correlation 
between silence and initiative (—.68), i.e., 
the person with the longest silences shows 
the fewest initiatives and vice versa; the posi- 
tive correlation between adjustment and ini- 
tiative (.50), and between adjustment and 
dominance (.44), and between adjustment 
and quickness (.61); the moderate positive 
correlation between Pt.’s Initiative and Pt.’s 
Dominance (.33), which, however, was not 
cross-validated (.05); and, finally, the high 
correlation between how often one takes the 
initiative in Period 2 and how quick he is to 
do it (.76). 

Both sets of intercorrelations shown in 
Table 1 were factor analyzed, the factors 
being extracted by using Thurstone’s centroid 
method * (18, pp. 161-170). In Table 2 are 
shown the original centroid factor loadings 
and the rotated factor loadings for the 60 
first interviews, while in Table 3 are shown 
similar values for the sample of 60 second 
(replication) interviews. 

From an inspection of Tables 2 and 3 it is 
clear that the results of the factor analysis of 
the 60 first interviews were highly reliably 
cross-validated on the sample of 6C second 
interviews. Thus, it is necessary only to de- 
scribe the results shown in Table 2. From 
the column of rotated factor loadings under 
Factor I in this table it would appear that 
this first factor is in part a latency, or speed 

#In extracting the factors, using Thurstone’s cen- 
troid method as a base, computations were made 
on IBM machines, using procedures developed by 
Arthur Couch with assistance from Eugene Kass- 


baum, both of the Department of Social Relations 
at Harvard University. 
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Table 2 
Factors for Interaction Variables Based on First Interview of 60 Patients 


Centroid Loadings 


Rotated Loadings 


Variables 


I 


= 


. Pt.’s Units 

. Pt.’s Action 

. Pt.’s Silence 

. Pt.’s Tempo 

. Pt.’s Activity 

. Pt.’s Adjustment 

. Dr.’s Adjustment 

. Pt.’s Synchronization 

. Dr.’s Units 

. Pt.’s Initiative (Per. 2) 
. Pt.’s Dominance (Per. 4) 
. Pt.’s Quickness (Per. 2) 


Centaur 


05 —94 
—18 87 
93 —13 
—12 93 
—28 93 
—73 15 
—83 05 
$1 
08 —93 
—74 20 
—57 —05 
—91 25 
Sil. Act. 


= 


2288 


Table 3 
Factors for Interaction Variables Based on Second Interview of 60 Patients 


Centroid Loadings 


Rotated Loadings 


Variables 


. Pt.’s Units 

Pt.’s Action 

. Pt.’s Silence 

Pt.’s Tempo 

Pt.’s Activity 

Pt.’s Adjustment 

. Dr.’s Adjustment 

. Pt.’s Synchronization 

. Dr.’s Units 

. Pt.’s Initiative (Per. 2) 
. Pt.’s Dominance (Per. 4) 
. Pt.’s Quickness (Per. 2) 


1 
2. 
3 
4. 
6. 
7 
8 
9 


Act. Adj. 


of response measure. That is, it reflects how 
long a patient is silent, on the average, be- 
fore he responds to the interviewer’s last ut- 
terance. For simplicity one can label this a 
“silence” factor. Patient interview interaction 
variables having the highest rotated loadings 
on this first factor are Pt.’s Silence (.93), 
Pt.’s Adjustment (—.73), Pt’s Synchroniza- 
tion (.51), Pt.’s Initiative (—.74), Pt.’s 
Dominance (—.57), and Pt.’s Quickness 
(—.91). By Interaction Chronograph meas- 
urement, Pt.’s Silence and Pt.’s Quickness 
are both measures of a patient’s average dura- 


tion of interview silence: the former variable 
measures this for all five periods, the latter 
variable only during Period 2. The minus 
sign of the factor loading for Pt.’s Quickness 
(—.91) should be disregarded in the inter- 
pretation of this first factor as the negative 
direction is an artifact of measurement due 
solely to the inverse manner in which this 
variable is recorded in the present model of 
the Interaction Chronograph. (One or two 
other variables pose similar but unimportant 
“sign” artifacts.) Pt.’s Adjustment, which has 
the next highest loading on Factor I, is also 


423 
70 —63 il 14 
—74 49 04 16 
75 56 13 —07 
—74 57 —02 04 
—85 46 —09 04 
—62 -41 —33 10 
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. Init. 
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a silence measure, i.e., the average of the 
durations of a patient’s interruptions minus 
the durations of his failures to respond (si- 
lences). 

The last two variables with loadings on 
this factor, Pt.’s Initiative and Pt.’s Domi- 
nance, are both frequency measures and thus 
differ from the former, all of which are dura- 
tion measures. Pt.’s Initiative represents the 
percentage of times the patient took the ini- 
tiative (i.e., spoke again following his own 
last utterance) during the interviewer’s 12 
prescribed failures to respond in Period 2. 
Thus, in a sense, it might be speculated that 
the Pt.’s Initiative variable could reflect a 
“drive” ® aspect of human interaction (i.e., 
an additional parameter of an individual’s 
frequency of interaction) or, at the least, 
more than what is measured by the duration 
measure, Pt.’s Silence, alone. That is, how 
long one waits before he acts (Pt.’s Silence) 
as an Interaction Chronograph measure, 
would seem not to reflect all that is involved 
in the Pt.’s Initiative variable and vice versa. 
For there are some patients who will act, i.e., 
will take the initiative, and there are others 
who will not under conditions of interviewer 
silence. Both types of patients receive dura- 
tions of silence scores, while only the for- 
mer group receives an initiative score for this 
unit of interaction. (The tenability of this 
dual-nature hypothesis of Pt.’s Initiative is 
strengthened by the fact that Pt.’s Initiative 
is the best measure of the Fourth Factor, 
with a factor loading of .50 on it. This will 
be described below.) 

The factor loading of —.57 for Pt.’s Domi- 
nance on Factor I indicates that there may be 
still a third parameter present in this “dura- 
tion of silence” first factor. That is, this fac- 
tor loading of —.57 for Pt.’s Dominance on 
Factor I (“silence”) shown in Table 2, and 
the r of —.52 between Pt.’s Dominance and 
Pt.’s Silence shown in Table 1, indicate that 
people who tend to have long durations of si- 
lence during their interview tend also signifi- 
cantly not to dominate or out-talk the inter- 


5Some evidence for this hypothesis (12, p. 332) 
comes from a cross-validated statistically significant 
r of 44 between our Pt.’s Initiative variable and a 
patient’s score on the Taylor Anxiety Scale (pur- 
ported to be a measure of Hullian drive level). 
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viewer when he interrupts them 12 times in 
Period 4. This finding that patients with 
longer average interview silences exhibit a 
greater frequency of terminations of speech 
(submissions) when interrupted, would seem 
to open up a new area for future research to 
further our understanding of still other as- 
pects of two-person interactions. 

The second rotated factor shown in Table 2 
(and cross-validated in Table 3) appears to 
reflect “duration of patient action per unit,” 
i.e., how long on the average the patient spoke 
or otherwise communicated during each of his 
interaction units. This “action” factor reflects 
the earlier discussed high negative correlation 
between Pt.’s Units and Pt.’s Action, as can 
be seen by the factor loading of —.94 for 
Pt.’s Units and of .£87 for Pt.’s Action on 
Factor II. Other patient variables with high 
loadings on the second factor are: Tempo 
and Activity (both .93), and Synchronization 
(—.61). The loading of —.93 for Dr.’s 
Units on this factor reflects the r’s of .97 and 
.95 (shown in Table 1, above and below the 
diagonal, respectively) between Pt.’s Units 
and Dr.’s Units. The latter near unity corre- 
lations are not surprising since the rules of 
the partially standardized interview require 
the interviewer (in all periods but the second) 
to take his lead from the interviewee and to 
respond with a five-second utterance to each 
of the patient’s communication units, thus 
earning the interviewer one Dr.’s Unit for 
each Pt.’s Unit. 

The loadings on Factor II of .93 for Pt.’s 
Tempo and .93 for Pt.’s Activity are interest- 
ing since they indicate that the “action” fac- 
tor is highly saturated with these two derived 
interview interaction variables. Since these 
same variables do not appear in the “silence,” 
or first factor (loadings of —.12 and —.28, 
respectively), it would appear that, as inter- 
view variables, Pt.’s Tempo and Pt.’s Activity 
are made up primarily of Pt.’s Action rather 
than Pt.’s Silence; despite the fact that Pt.’s 
Tempo is scored as the average sum of Pt.’s 
Action plus Pt.’s Silence while Pt.’s Activity 
is scored as the average Pt.’s Action minus 
Pt.’s Silence. This fact is shown even more 
clearly in Table 1, where it is seen that Pt.’s 
Action has very high correlations with Pt.’s 
Tempo and Pt.’s Activity (.92 and .88, re- 
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spectively, in the first interview; and .95 and 
.95 in the replication interview) whereas Pt.’s 
Silence has only minimal correlation with 
these same variables (—.22 and —.41, re- 
spectively; and —.19 and —.36). In part 
this is due to the restricted range of durations 
of silence (from 5 to 19 hundredths of a 
minute for this sample) relative to the greater 
range of durations of action (from 11 to 257 
hundredths of a minute) (10). 

Thus the high loadings of Pt.’s Tempo and 
Pt.’s Activity, as well as that of Dr.’s Units 
on Factor II, would appear to be reflections 
of the derived nature of these two former 
variables, including the differences in range 
for silence and action, and the standardized 
rules governing the interviewer’s behavior in 
regard to the Dr.’s Units variable. With the 
exception of Pt.’s Synchronization, which is 
highly correlated with Pt.’s Units (see Table 
1), the absence of significant loadings of any 
of the other interview variables on Factor II 
would indicate that it is purely a reflection 
of how often a patient communicates (Pt.’s 
Units) and how long on the average he talks 
once he has begun (Pt.’s Action). Although 
Factor II justifiably could thus equally be 
thought of as reflecting units, or one’s num- 
ber of interactions (a frequency measure), we 
prefer for the time being to call it “action” 
and thus think of it as a duration measure. 
Only further research will permit us to begin 
to unravel the interesting question of whether 
one’s frequency of (inter)action determines 
his duration of action, or vice versa, or neither 
(as, for example, in free in contrast to stand- 
ardized interviewing). 

Rotated Factor III in Table 2 appears to 
be a much weaker factor than either Factors 
I or II; with no factor loading reaching a 
level above (minus) .36. This weak factor is 
saturated most heavily with Pt.’s Adjustment 
(factor loading of —.36), and less so by Pt.’s 
Units (.20), Pt.’s Action (.20), Pt.’s Tempo 
(.20), Dr.’s Adjustment (.30), Dr.’s Units 
(.24), and Pt.’s Quickness (.18). These re- 
sults are cross-validated for the most part in 
Table 3, although, in this table, Pt.’s Syn- 
chronization (the frequency with which the 
patient maladjusts, which with the typical 
subject involves a frequency count of how 
often he fails to respond quickly to the inter- 
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viewer rather than how often he interrupts 
the interviewer) has the highest factor load- 
ing (.45). Since all the other interview vari- 
ables with loadings on Factor III in Tables 2 
and 3 reflect a duration measure in one way 
or another—either of actions or silences—and 
since Pt.’s Adjustment has one of the highest 
factor loadings, we tentatively will consider 
this factor to reflect one’s “average duration 
of (mal)adjustments”; remembering, how- 
ever, that the frequency of these maladjust- 
ments (i.e., Pt.’s Synchronization), also may 
be reflected in this third factor. That Factor 
III is neither well established nor strongly 
present and thus is a weak factor, however, 
is also clear from the results.° 

In Table 2 Rotated Factor IV seems to be 
most heavily saturated with Pt.’s Initiative 
(factor loading of .50), although Pt.’s Syn- 
chronization has a loading of —.45 on this 
same factor. On replication (Table 3), the 
factor-loading for Pt.’s Initiative is cross- 
validated (.48), while that for Pt.’s Syn- 
chronization is only moderately so (—.27). 
Pt.’s Dominance, which in Table 2 had an 
insignificant loading of —.18 on this fourth 
factor, shows a significant loading of —.42 
during the second, or retest interview (Table 
3). However, since the test-retest reliability 
for measurement of the Pt.’s Dominance vari- 
able is only moderate (9, p. 273; 14, pp. 272- 
273), in contrast to the very high reliability 
for all the other variables, the full meaning of 
this finding remains unclear. Thus, in view of 
the fact that Pt.’s Initiative had the highest 
loading on Factor IV during the first inter- 
view (.50) and this was cross-validated in 
the second interview (.48), while the signifi- 
cant loadings for the other variables were not 


® Further evidence for the tentative nature of Fac- 
tor III (and Factor IV) comes from a second factor 
analysis of the data in Table 1, done for us by W. A. 
Botzum and Charlotte David of the University of 
Portland. Using an oblique solution in contrast to 
the orthogonal solution shown in Tables 2 and 3, 
their analyses confirmed our Factors I and II, but 
indicated that Dr.’s Adjustment and Pt.’s Quickness 
had the highest consistent loadings on Factor III 
(the high Factor loading for Dr.’s Adjustment on 
Factor III is also present in Tables 2 and 3), and 
that, whereas Pt.’s Initiative had a high loading on 
Factor IV in the first interview, this was not cross- 
validated in the data from the second interview. 
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cross-validated by appearing in both inter- 
views, we tentatively will consider that the 
fourth factor reflects primarily the habitual 
degree of “initiative” which an S will manifest 
under conditions of planned interviewer si- 
lence. 

In the last column of Table 2 (and Table 
3) are shown the communalities (h*) of each 
interview interaction variable. The values in 
this column represent the amount (propor- 
tion) of the variance of each interview vari- 
able which is accounted for by our four fac- 
tors (6, p. 492). It is clear from Table 2 
(and cross-validated in Table 3) that the 
four factors which have emerged from the 
factor analysis (pt.’s “silence” behavior, pt.’s 
“action” behavior, pt.’s “adjustment” behav- 
ior, and pt.’s “initiative” behavior) exhaust 
almost the total variance of the following in- 
terview variables: Pt.’s Units (93%)," Pt.’s 
Action (85%), Pt.’s Silence (90%), Pt.’s 
Tempo (96%), Pt.’s Activity (98%), Dr.’s 
Units (96%), and Pt.’s Quickness (93%). 
They account for the variance of Dr.’s Ad- 
justment (79%), Pt.’s Synchronization (84% ) 
and Pt.’s Initiative (84%) to a slightly 
smaller extent. The variances of two inter- 
view variables are not materially accounted 
for by these four rotated factors. They are 
Pt.’s Adjustment (69%), which most helps to 
define our weak third factor, and Pt.’s Domi- 
nance (36%) which, with a cross-validated 
moderate factor loading on Factor I, and a 
noncross-validated loading on Factor IV, has 
only about one-third of its variance accounted 
for by the four factors. In view of these find- 
ings it seems safe to conclude that, on the 
one hand, more will have to be learned about 
the Pt.’s Adjustment variable and, on the 
other hand, as suggested by us in an earlier 
publication (14, p. 272), considerable error 
variance still exists in the measurement of 
our Pt.’s Dominance variable. Refinements in 
definition, or increase in the length of the in- 
terruption subperiod (for example, lengthen- 
ing it to a fixed 15 minutes rather than leav- 
ing it free as now, with an average duration 
of only approximately two to four minutes) 
may permit us to reduce the aforementioned 


7 These values more correctly should be expressed 
as proportions rather than as percentages. 
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error variance of Pt.’s Dominance and thus 
learn more about its nature. 

By squaring each row entry in each of the 
four columns of rotated factor loadings in 
Tables 2 and 3 and then summing down the 
columns for each of these four rotated fac- 
tors, one can obtain an estimate of how much 
of the total variance is accounted for by each 
factor (6, p. 492). Rounding off, the four fac- 
tors named at the bottom of each column ac- 
count individually for 41%, 47%, 5%, and 
7% of the total variance, respectively, in 
Table 2, and 38%, 44%, 9%, and 9%, re- 
spectively, in Table 3. It is thus clear that, 
in terms of the Interaction Chronograph vari- 
ables so far employed by us, an S’s interview 
interaction behavior consists primarily of ac- 
tions and silences; and moderately of his 
pattern of adjustment (and synchronization) 
with the interviewer, on the one hand, and 
his degree of initiative-taking when the inter- 
viewer attempts to assess this by purposely 
remaining silent, on the other. To date we 
have not worked with Chapple’s temperament 
and personality variables (4). Although these 
represent various arithmetic combinations or 
ratios of the 12 variables shown in Table 1 
of the present paper they, nevertheless, still 
may contribute additional factors other than 
the four reported in the present paper. 


Discussion 


The results of the present study may be 
compared to the findings reported by Lundy 
in an unpublished study (7). By employing 
two therapists with one patient for 38 inter- 
views (each therapist saw the patient indi- 
vidually for 19 interviews in an odd-even 
design), Lundy was able to relate interview 
Interaction Chronograph variables to certain 
clinical impressions regarding the progress of 
psychotherapy as these qualitative data were 
assessed from the typewritten interview tran- 
scripts. In addition to demonstrating some 
interesting relationships of this type in a 
study which used “free” psychotherapy in- 
terviewing, in contrast to the standardized 
interview employed by us (8), Lundy reports 
the results of a factor analysis of the Inter- 
action Chronograph variables for the inter- 
view interactions of his one patient with each 
of the two therapists. Thus, as was the case 
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in our study, Lundy’s use of two therapists 
with the same S enabled him to attempt to 
cross-validate his findings. One difference be- 
tween his study and ours was that we used 
60 Ss, each interviewed twice (once by each 
of two interviewers) while Lundy used one 
S interviewed 19 times by each of two inter- 
viewers. In addition, while Lundy’s Interac- 
tion Chronograph variables differed slightly 
from those used by us and described above, 
they were sufficiently similar to permit com- 
parison. 

Lundy’s factor analysis revealed five fac- 
tors. Three of his factors were identified con- 
sistently on replication (using his second set 
of 19 interviews). Although given slightly dif- 
ferent names by Lundy (Speed of Response, 
Amount of Therapist Talk, and Dominance) 
the first two of these are clearly similar to 
our Factors I and II; i.e., our patient’s “av- 
erage duration of silence” before he spoke 
again, and “average duration of patient ac- 
tion.” Superficially, this latter “duration of 
patient (talk) action” factor, which we found 
as our Factor II, may differ from Lundy’s 
own Factor II, “amount of therapist (talk) 
action.” However, since our therapist was in- 
structed always to speak in 5-second utter- 
ances, thus making the average amount of 
therapist talk a constant (5 seconds) from 
one patient to the next, there was no oppor- 
tunity for this therapist variable to reflect it- 
self in one of our factors. 

There is good evidence (8, p. 357), how- 
ever, for a high positive correlation between 
amount of talk produced by each of two con- 
versationalists in free conversation (such as 
in Lundy’s unstandardized psychotherapy in- 
terviews). Thus one might hypothesize that 
had our interviewers been allowed to talk 
freely, as did Lundy’s, and had we then re- 
corded durations of therapist actions, we also 
might have found a high factor loading for 
amount of therapist talk on our Factor II. In 
partial support of this is Lundy’s report that 
his single patient talked more with the thera- 
pist who talked more, and talked less with 
the therapist who talked less. This positive 
covariation which Lundy reports, while pos- 
sibly surprising, confirms our own and Gold- 
man-Eisler’s similar and independent obser- 
vation (8, pp. 357-358) and thus adds sup- 
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port to our belief that the reverse is also 
true; ie., that a therapist talks more with a 
patient who talks in long utterances, and less 
with one who speaks for brief durations. If 
this is true, Lundy’s second factor and ours 
appear to be identical. 

Lundy’s third factor, Dominance, differs 
from our third factor, which we chose to call 
Pt.’s Adjustment. Since our third factor was 
the weakest, and since our earlier results have 
shown that our Dominance variable (and pre- 
sumably Lundy’s also) contains considerable 
measurement error variance, we are not sur- 
prised at the lack of agreement between our 
own third factor and Lundy’s. However, 
Lundy does report that his Pt.’s Adjustment 
variable also had a high loading on his third 
factor, and thus his third factor may be simi- 
lar to ours. 

Lundy’s two weakest and also not cross- 
validated fourth and fifth factors were Fre- 
quency of Client Response and Client Vari- 
ability. The former is similar to our Pt.’s 
Units variable and thus may be similar to 
our second factor which, as will be remem- 
bered, though called “action” by us was also 
the factor most highly saturated with our fre- 
quency, or Pt.’s Units, variable. Lundy’s last 
factor, Client Variability, is one which is 
made up of such derived or second-order in- 
dices as the variation of the first-order vari- 
ables Tempo and Activity from one part of 
the interview to another. Since to date we 
have not employed such an interview measure 
we cannot compare our results with Lundy’s 
in regard to this factor. 

The results of Lundy’s factor analysis and 
our own are interesting for one additional 
reason. Although each study was replicated, 
Lundy used one S interviewed 19 times; we 
used 60 Ss interviewed once. Interestingly, 
the results of both studies indicate that dura- 
tion of silence and duration of action are the 
most stable (and independent) factors in two- 
person communicative interactions. 


Summary 


This study employed 60 outpatient Ss in- 
terviewed under conditions of a previously 
described standardized interview. The results 
indicate that as viewed from an Interaction 
Chronograph framework, doctor-patient inter- 
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actions (and possibly most other two-person 
interactions) consist of two very stable fac- 
tors for any given individual: (a) how long 
on the average he or she waits or remains 
silent before communicating, and (5) the 
number and average duration of each of these 
communicative interactions. A third factor, 
the frequency with which one initiates or 
starts again with another communication unit 
of his own when his partner has not answered 
him, while weak, nevertheless may prove to 
be more stable as the length of time during 
which it is assessed (only during Period 2 in 
our partially standardized interview) is in- 
creased. A fourth factor, the efficiency with 
which a member of the communicating pair 
synchronizes and adjusts (or maladjusts) to 
his partner, was evident as a factor (in both 
our study and Lundy’s), but it, too, was not 
as Clear-cut as were the first two factors. The 
same holds true for a weak and possibly fifth 
factor, Dominance. Doctor-patient or other 
two-person patterns of dominance-submission 
may reflect an additional important interper- 
sonal dimension and thus this fifth factor- 
variable (again measured in only one sub- 
period of our present partially standardized 
interview) will require the development of 
more refined methods of measurement, or an 
extension of the total time during which our 
dominance variable is measured. 

Further evidence for the validity, or psycho- 
logical meaningfulness, of four of these fac- 
tors (all but Pt.’s Adjustment) comes from a 
recently completed and cross-validated study 
of the psychological test correlates of our 
interview Interaction Chronograph measures 
(12). In terms of the standard psychologi- 
cal test measures (Rorschach, WAIS, Taylor 
Anxiety Scale, etc.) used in the latter study, 
different psychological test-interview correla- 
tions were found for patients who: (a) were 
high or low in their interview silences, (+) 
were high or low in their interview actions, 
(c) showed a high or low frequency of initia- 
tives, and (d) submitted to or dominated the 
interviewer. 

A second study (13), now undergoing cross- 
validation, between certain aspects of the ver- 
bal content of the interviews and the overt 
Interaction Chronograph scores, likewise has 
identified interview content (descriptions of 
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self and others) differences between patients 
who: (a) talk for long or short durations, 
(5) have long silences and corresponding high 
negative maladjustments, (c) show a high or 
low frequency of initiatives, and (d) earn 
high or low dominance scores. 

Thus it would appear that further evidence 
for the validity of the factors identified by 
our cross-validated factor analysis is coming 
from a variety of additional sources. It is our 
belief that the results of the present study 
represent merely the smallest number of fac- 
tors which can account adequately for inter- 
view interaction behavior, as this is assessed 
solely by the Interaction Chronograph. It is 
expected that a factor analysis which included 
not only intercorrelations of the 12 Interac- 
tion Chronograph variables herein used but 
also the additional intercorrelations of each 
of these variables with such other data as 
psychological test scores, interview content 
measures, Bales Interaction Process scores, 
etc. would yield still further stable interview 
interaction factors, and thus add to our 
knowledge of interpersonal behavior. 


Received July 28, 1958. 
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Problems of Reliability in Observing and 
Coding Social Interactions*’ 


Allen T. Dittmann 
National Institute of Mental Health 


In observing social interactions and coding 
reports of the observations, problems of re- 
liability arise at every turn. This paper re- 
ports a series of reliability studies which have 
been carried out at the Child Research Branch, 
NIMH, where a variety of children have been 
observed in “naturalistic” settings within a 
residential treatment framework. Various as- 
pects of the findings are relevant to different 
substantive studies within the research pro- 
gram, but the reliability tests are better 
understood when presented as a whole. 

The basic data are reports of 5 to 10 minute 
observations of children in social situations, 
dictated from memory immediately after the 
observations. The protocols are coded in the 
“Circle” of interpersonal mechanisms (1) by 
pairs of coders working together. Reliability 
of coding in this paper means agreement be- 
tween two independent coder pairs, or one 
pair repeating coding with at least six months 
intervening. Reliability of observing means 
two observers reporting independently on the 
same situation. 

In judging which behaviors reported by the 
observer are interpersonal rather than self- 
directed acts, agreement is moderately high. 
Disagreements occur where the observer’s 
descriptions lack detail and where the inter- 
actions involved in the acts are equivocal. 


1 An extended report of this study may be obtained 
without charge from Allen T. Dittmann, National 
Institute of Mental Health, Bethesda 14, Maryland, 
or for a fee from the American Documentation In- 
stitute. Order Document No. 5740, remitting $1.25 
for microfilm or $1.25 for photocopies. 

2The author wishes to thank D. Wells Goodrich 
and Harold L. Raush for their help throughout these 
studies. 


The reliability of coding protocols item by 
item is far greater than chance expectancy, 
and yet the degree of association is only 
moderately high. The reliability of profiles 
made up of the codings of many interactions, 
on the other hand, is very high, both in repeat 
reliability and in agreement of independent 
pairs of coders. The intensity dimension yields 
similar results: agreement is clearly beyond 
chance, but only groups of many such ratings 
can be used with confidence. 

Reports of two observers reporting on the 
same situations have also been studied. While 
there is only general agreement on individual 
acts and on the sequence of events, profiles of 
codings based on the reports are just as simi- 
lar as are those based on independent codings 
of the same reports. 

Conclusions: The reliability of these meas- 
ures is analogous to that of tests. Items of low 
reliability may be combined to yield tests of 
high reliability. Where profiles of individuals 
or groups are to be compared, the system we 
have used is very adequate. Where sequences 
of individual acts are to be studied, however, 
as in the details of symptomatic behavior, the 
number of sequences must be large enough 
that the low item reliability does not detract 
from any generalizations drawn from the find- 
ings. 


Brief Report. 
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Environmental Factors and Outpatient Clinic Intake 


Marvin Hyman and Julian Wohl 
VA Mental Hygiene Clinic, Detroit © 


In the everyday functioning of a mental 
hygiene clinic or agency, marked fluctuations 
can be observed in the number of people who 
present themselves for the available services. 
Each agency develops its own explanation. for 
this phenomenon with the result that there 
currently exists a number of untested hypothe- 
ses to explain variations in daily intake. These 
hypotheses are varied and sometimes colorful. 
The authors have heard explanations that 
hypothesize relationships between daily intake 
and weather conditions, unemployment, atom 
bomb explosions, crime rate, sputniks, etc. 
Although these hypotheses have been un- 
tested in the main, they are nevertheless the 
product of observations of experienced clinical 
workers and deserve some evaluation. 

It was the purpose of this study to examine 
systematically the daily and monthly varia- 
tions in the number of persons who present 
themselves for mental hygiene services and to 
investigate the possible relationships that 
existed between these variations and (a) cer- 
tain climatological conditions and (b) certain 
aspects of the employment situation. Specifi- 
cally, the following hypotheses were tested: 
(a) daily intake varies systematically accord- 
ing to the day of the week and the month of 
the year, (5) daily intake is related to daily 
climatological conditions, e.g., temperature, 
precipitation, sunshine, etc., (c) daily intake 
is correlated with the rate of local unemploy- 
ment. 

In order to test the foregoing hypotheses, 
the daily intake of the Detroit VA Mental 
Hygiene Clinic for the year 1956 was studied. 
Daily intake was defined as the number of in- 
dividuals presenting themselves on a particular 
day, without appointment, for the various 
services offered by the clinic. Patients who 


were seen on the basis of regularly scheduled 
appointments were not included in the count. 

From information supplied by the U. S. 
Department of Commerce Weather Bureau 
(1), it was possible to select three variables 
which grossly reflected weather conditions on 
a particular day. These were: Average tem- 
perature, total precipitation, and percentage of 
possible sunshine. 

The Michigan Employment Security Com- 
mission * supplied two indices of the employ- 
ment situation in the Detroit Metropolitan 
Area for 1956: Total unemployment estimate 
(by months) and the number of claims filed 
for unemployment compensation (by weeks). 


Results 


In 1956 there were 252 working days. The 
median intake over the year was 7.43 patients 
per day. Reference to Table 1 reveals that 
heavier intake (more than seven patients) 


Table 1 


Chi Square Test of Intake Variation by 
Day of the Week 


Day of the week 


Intake 


Under the median 15 
Over the median 


20 37 2 2 
3 


Note.—Chi square = 23.238; df = 4; » = .001. 


was most likely to occur on Mondays, and 
least likely on Wednesdays. Similarly, refer- 
ence to Table 2 indicates that heavier intake 
days were most likely to occur in the months 
of September, November, and March; and 


1 Personal communication. 
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Table 2 


Chi Square Test of Intake Variation by 
Month of the Year 


Number of working days 


Note.—Chi square = 30.281; df = 11; p = .003. 


were least likely to occur in the months of 
June, February, and December. 

The effect of weather conditions upon in- 
take was submitted to chi square analyses with 
the following results: Temperature was re- 
lated to intake in a curvilinear fashion, i.e., 


heavier intake tended to occur on those days 
of the year when the temperature was mod- 
erate (second and third quartiles of the tem- 
perature distribution) and was least likely 
when it was very cold or very warm (first and 
fourth quartiles). The chi square was 4.036 
with 1 df, p = .04. The presence or absence of 
rain, snow, and/or sunshine seemed not to be 
significantly related to intake. 

The analysis of the relationship between 
intake and conditions of employment revealed 
that intake is unrelated either to the num- 
ber of unemployed during the year (rho = — 
.04), or to the number of claims for unem- 
ployment compensation that are filed (rho = 
02). 

We suspect, but have no data to support 
this, that Monday is the heaviest intake day 
because of a backlog of applicants which ac- 
cumulates over the weekend. Another guess is 


that Wednesday typically has few intakes be- 
cause of a phenomenon peculiar to Detroit. 
On that day in Detroit most physicians do not 
maintain office hours and private clinics are 
closed. We would guess that the community 
has generalized from this fact and lives as if it 
applied to the VA Clinic. 

The mythology of the outpatient clinic has 
suggested that unemployment in an industrial, 
one-industry area would affect intake activity 
in an outpatient mental health agency. The 
agency, so runs this argument, deals primarily 
with marginal people of low socioeconomic 
class who would be more likely to come to the 
agency when times get hard. Although it is 
true that the clinic population is comprised 
largely of individuals from low socioeconomic 
classes, the results of this study show that 
whatever accounts for fluctuation in requests 
for clinic services, it is not the labor situation. 
Like the U.S. mail, our clientele was not dis- 
couraged by rain, sleet, snow, or hail, although 
when the weather became very hot, it tended, 
unlike Noel Coward’s Englishmen, “to stay 
out of the midday sun,” and when it was ex- 
tremely cold, wisdom overcame anxiety, and 
the clients remained close by what passed for 
their firesides. 

From these results, it can be concluded that, 
by and large, the applicants for the services 
of this clinic come because of needs for help 
which are not based on situational factors 
such as unemployment and weather conditions. 


Summary 
This study tested the hypotheses that 
weather and employment conditions affected 
the applications for treatment in a VA mental 
hygiene clinic. With some exceptions, the ob- 
tained data did not support these hypotheses. 
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Under Over 

Month median median 

Jan. 12 9 

Feb. 13 7 

Mar. 8 14 

Apr. 8 13 

May ll 

June 15 6 

July 17 4 

Aug. 14 9 

Sept. 3 16 

Oct. 13 10 

Nov. 6 14 

Dec. 11 8 
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The analysis to be reported represents a 
partial approach to two current clinical prob- 
lems: (a) statistical validation of relation- 
ships among specific projective test scores or 
signs; and (5) measurement of relationships 
between psychological variables and different 
types of medically diagnosed physical pathol- 
ogies. 

Both of these somewhat more theoretical 
questions were derived from an immediate 
practical problem of the need for a set of 
rigidly definable and defensible bases for the 
identification of specific test variables to be 
used in a comparison of significant psycho- 
logical differences between two groups of 
equally physically handicapped male veterans 
(3). 

While several factorial studies of projective 
techniques have been reported in the litera- 
ture, these analyses have been directed pri- 
marily to the structure of a given test rather 
than to the intertest factorial composition of 
individual psychological variables. Thus, there 
are Guertin’s analyses of the Szondi and 
Bender-Gestalt (4, 5, 6, 7, 8, 9) and the 
studies of the Rorschach (1, 2, 10, 11), but 
as far as the authors know there have been 
no factor analysis studies reported of variables 
from more than one projective technique. 

The literature describing relationships be- 
tween physical pathology and personality 


functioning is vast. Extensive efforts have 
been made to demonstrate personality types 
associated with particular physical disabilities. 
In less intensive fashion, a great deal of work 
has occurred describing the relationship be- 
tween kind of disability and adjustment to it. 
Numerous papers have appeared regarding 
prediction of specific behaviors in the areas of 
hospital adjustment, response to medical treat- 
ment, or posthospital adjustment. However, 
in none of these reports were medical and psy- 
chological data integrated into a correlational 
matrix and factored. 

Therefore, the major purpose of this study 
was to verify by means of multiple factor 
analysis, the relationships among the projec- 
tive test scores and signs previously considered 
to be measures of seven specific psychological 
variables. A secondary purpose was to deter- 
mine, also by means of multiple factor analy- 
sis, the relationships among four types of 
physical pathology and these same projective 
test scores. 


Procedure 


The total sample consisted of 105 male 
members of the VA Center, Martinsburg, 
West Virginia Domiciliary and 45 equally 
physically disabled male veterans residing in 
West Virginia who had never applied for ad- 
mission to any VA Domiciliary or similar type 
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Table 1 
Variables Included in Initial Correlation Matrix 


Variable Test Definition Variable Test 


Definition 


A. Test Scores from the four projective tests: 26" Rorschach 


Withdrawal 
Bender-Gestalt Upper or left mar- 27 ~+=Bender-Gestalt 
gin placement of all 


9 figures 28 Bender-Gestalt 
Bender-Gestalt Figure # 2, the line of 


3 circles straight 29" Person Drawing 

or left slanted, not : 

right 30 Sentence Completion 
Person Drawing Hands omitted 
Rorschach Total % human con- Depression 

tent 31. Rorschach 
Rorschach Total sum color 


(4FC+1CF+1.5C) 32" Bender-Gestalt 
Sentence Completion Total press-rejection 


score 
Sentence Completion Total need-affiliation 33 +Bender-Gestalt 
score 
Sentence Completion Total need-infavoid- 34*  Bender-Gestalt 
ance score 
Energy or Activity 
9 Person Drawing Legs or arms uneven eat ae 
in length or width Rorschach 
~——Rorschach Total animal move- 
ment score (FM) 36* Bender-Gestalt 
11* Rorschach large de 
12" Bender-Gestalt Figure #6, 14 times 
or more enlarged 
13" Bender-Gestalt Figure #8, 1} times 39 Sentence Completion 
or more enlarged 
Feelings of Invalidism 40" 
14" Rorschach Total animal object 
content score Ambition 
15‘ Rorschach Total anatomical con- 41 Rorschach 
tent score 
16 Rorschach Total plant content 42* Rorschach 
score 
Rorschach Total number of dys- 43 Rorschach 
phoric responses 
18 Person Drawing Unseeing eye (drawn 
as empty circle) 44 Person Drawing 
19 Person Drawing Body shading other 
than hair or cloth- Sentence Completion 
ing details present ' 
20 ~=Person Drawing Legs or arms drawn as Sentence Completion 
single straight line* ; 
21* Person Drawing Legs omitted 47* Person Drawing 
22 Sentence Completion Press-physical lack B. Other Variables: 


38" Sentence Completion 


Sentence Completion 


Total score of shading 
as two-dimensional 
perspective (k) 

Erasures present in 
one or more figures 

Retracing present in 
one or more figures 

Person drawn as off- 
balance 

Total score need-de- 
fendance 


Sum of achromatic 
color (C’) 

Figure #2, horizon- 
tal slanted down to 
right 

Figure #3 rotated 
down (45° to 225°) 

Figure #6, horizontal 
‘curve slanted down 
to right 


Total sum of oral con- 
tent 

Figure #7, left side 
rotated down 

Center line of buttons 
indicated 

Total score press-nur- 
turance 

Total score need-pas- 
sivity 

Total score need-suc- 
corance 


Total sum of small 
details (d) 

Total number of re- 
sponses (R) 

Eagle or emblem con- 
tent present in one 
or more cards 

Additional non-cloth- 
ing details present 

Total score need- 
achievement 

Total score need- 
recognition 

Neck omitted 


Anxiety (Tension) 48 Orthopedic pathology present 


23 Rorschach Total number of cards 49 


Respiratory pathology present 


rejected 50 Cardiac pathology present 


24 ~—- Rorschach Total score of inani- 51 Neurological pathology 


present 


mate movement (m) 52 IQ score from group-administered Otis S.A., 
25‘ Rorschach Total score of diffuse Intermediate level, Form A 


shading (K) 53 Physical fitness test score 


chao uently dropped from the factor analysis ma 
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of institutional living. Each subject had a 
definite medical diagnosis of one or more of 
the following four disease processes: ortho- 
pedic, as verified by X-rays; respiratory, 
as verified by X-rays and other laboratory 
studies; cardiac, as verified by both electro- 
cardiograph and blood studies; neurological, 
as verified by either spinal tap or electro- 
encephalogram. Incidence of pathology for 
the total sample of 150 cases was: orthopedic, 
43%; respiratory, 28%; cardiac, 38%; neu- 
rological, 16%. 

A completed battery of the following five 
psychological tests, group administered, was 
available for each of the 150 cases: Otis S.A. 
Test of Mental Ability, Intermediate Exami- 
nation, Form A; Bender-Gestalt figure draw- 
ings (from slides); Rorschach; Drawing of 
a Person; and the Rhode-Hildreth Sentence 
Completion Test. One other measure also 
available was a quantitative score of physical 
fitness capacity derived from a performance 
test developed by the Chief of Physical Medi- 
cine and Rehabilitation (3). 

From the four projective tests 47 scores 
were selected which, on the basis of previous 
studies and of clinical experience, presumably 
reflect personality variables of significance in 
relation to the comparison of differences be- 
tween the Domiciliary members and the com- 
munity control sample. Such variables in- 
clude: withdrawal, energy, feelings of invalid- 
ism, anxiety, depression, dependency-passivity, 
and ambition. Table 1 summarizes the 53 
variables in the original correlation matrix. 

Tetrachoric correlations were computed 
among all of the 53 variables. In the case of 
quantitative test scores, the median was used 
as the cutting point. In order to have a matrix 
of a size more feasible to factor, 18 of the 47 
psychological test scores were dropped either 
because of essentially zero correlations or cor- 
relation coefficients above .75, with one or 
more of the variables included. 

The 35 X 35 matrix was factored by the 
Thurstone Complete Centroid Method. The 
factor loadings, after ten oblique rotations, 
are given in the following section. A cutting 


1The community control group was obtained 
through the cooperation of the West Virginia State 
Board of Vocational Education, Division of Voca- 
tional Rehabilitation. 
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point of +.35 has been used, except in three 
cases where the pattern of correlations seemed 
to justify going down to .32 or .33. Listed 
first are the variables loading uniquely on the 
given factor; then are listed, with the variable 
number in parentheses, those variables load- 
ing + .35 or above on other factors also. 


Results 


Factor A 

No. Variable Loading 

3 Hands omitted (Person Drawing) 63 

~=Need-affiliation (Sentence Completion) —.49 
[23] Card rejections (Rorschach) —.64 
[17] Dysphoric responses (Rorschach) 53 
[20] Single line legs or arms (Person 

Drawing) — 46 

[51] 


Neurological pathology present 36 

On the basis of the two test variables which 
have significant loadings for this one factor 
alone, Factor A is tentatively identified as a 
“withdrawal” variable. The significant posi- 


tive loading for neurological pathology pres- 
ent is consistent with clinical observations. 


Factor B 
Variable 


Erasures (Bender-Gestalt) 49 
Press-physical lack (Sentence Com- 

pletion) — 47 
Neurological pathology present 72 
Need-defendance (Sentence Completion) 
Uneven legs or arms (Person Drawing) 35 
Anatomical content (Rorschach) —.54 
High inanimate movement (Rorschach) 4 


Loading 


Factor B is tentatively identified as a “ten- 
sion” factor. The negative loadings for two 
scores usually interpreted as methods of 
defense against anxiety [22, 15] would be 
consistent with this hypothesis. 


Factor C 


Variable Loading 


Physical fitness test score nel) 
Additional details (Person Drawing) Al 
Small details (Rorschach) 37 
Press-rejection (Sentence Completion) 33 
Eagle or emblem content (Rorschach) 59 
Need-recognition (Sentence Completion) 
Single line legs or arms (Person Drawing) —.47 
Card rejections (Rorschach) —A7 
Uneven legs or arms (Person Drawing) — Al 
Oral content (Rorschach) Al 


The combination of variables with signifi- 
ficant loadings on Factor C would suggest a 


27 
22 
[st] 
[30] 
[9] 
[15] 
[24] 

No 
53 
4 
41 
6 
[43] 
[46] 
[20] 
(23] 
[9] 
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tentative identification as “needs for social ap- 
proval or attention” rather than either energy 
or ambition. 


Factor D 
No. Variable Loading 
25 Diffuse shading (Rorschach) 32 
[19] Body shading (Person Drawing) .76 
[28] Retracing (Bender-Gestalt) 68 
[15] Anatomical content (Rorschach) —.56 
[49] Respiratory pathology present —.50 


Factor D would appear to be an “anxiety” 
factor as differentiated from “physical ten- 
sion” reflected in Factor B. 


Factor E 

No. Variable Loading 

39 Need-passivity (Sentence Completion) 39 
[51] Neurological pathology present .93 
[28] Retracing (Bender-Gestalt) 60 
[30] Need-defendance (Sentence Completion) 
[19] Body shading (Person Drawing) Al 
[43] Eagle or emblem content (Rorschach) Al 

[9] Uneven legs or arms (Person Drawing) 39 
[17] Dysphoric responses (Rorschach) 38 


On the basis of the variable with the high- 
est unique loading for Factor E, it is tenta- 
tively identified as “dependency-passivity.” 


Factor F 
No. Variable Loading 
5 High weighted sum color (Rorschach) 71 
16 Plant content (Rorschach) 42 
[9] Uneven legs orarms (Person Drawing) —.46 
[49] Respiratory pathology present —34 


Factor F is possibly an indication of affec- 
tive lability. 


Factor G 
No. Variable Loading 
52 IQ .73 
2 ‘Figure #2, straight or slanted to left AS 
(Bender-Gestalt) 


50 Cardiac pathology present 44 
10 High animal movement (Rorschach) Al 
[19] Body shading (Person Drawing) 65 
[24] High inanimate movement (Rorschach) 43 


The combination of variables with signifi- 
cant loadings on this factor is clinically in- 
teresting as well as puzzling to some extent. 
One hypothesis is that there may be a kind 
of system of pseudo-intellectual defenses avail- 
able to persons with better than average in- 
telligence. 
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There were five variables, 18, 31, 33, 45, 
and 48, which had no significant loading on 
any of the seven factors. 

The seven factors were intercorrelated. 
Most of the correlations were low and nega- 
tive. Only two, between B and E (—.49) and 
between C and F (—.44), were above .35. It 
is, therefore, quite unlikely that any second- 
order factors would be found. 


Discussion of Results 


The factor structure that emerges from the 
oblique rotations appears to be fairly well 
defined. Of the originally postulated seven 
psychological variables for which projective 
test scores were selected, three were relatively 
easily identifiable as factors, namely: with- 
drawal (Factor A), anxiety (Factor D), and 
dependency-passivity (Factor E). Four of the 
postulated psychological variables did not 
emerge as factors, depression, ambition, en- 
ergy, and feelings of invalidism. While it is 
possible that adequate projective test indices 
of these variables were not included in the 
factor matrix, it is of interest that three some- 
what related factors were identified: affective 
lability (Factor F), needs for social approval 
or attention or prestige (Factor C), and 
pseudo-intellectual defenses (Factor G). Not 
previously anticipated was the appearance of 
a physical tension factor (Factor B) as dif- 
ferentiated from an anxiety factor. 

It can be noted that neither a test nor a 
medical pathology type of factor structure 
was found. That is, there was not found a 
series of factors each loaded with indices from 
a single test, or four factors each with one 
medical pathology variable showing a high 
loading. The pattern of factor loadings for 
the four types of medical pathology would be 
consistent with the theory that personality 
variables are not associated with specific types 
of physical disease. It may very well be that 
the personality variables are related to kinds 
of reaction to illness or ability, although this 
can not be concluded from evidence in the 
present study. 

We find that orthopedic pathology has no 
significant loadings on any of the seven fac- 
tors; respiratory pathology has negative load- 
ings on Factors F and D; cardiac pathology 
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shows a positive loading on Factor G; neu- 
rological pathology has high loadings on Fac- 
tors A, B, and E. 


Summary 

Forty-seven scores or indices were selected 
from four different projective tests and inter- 
correlated; also included in the correlation 
matrix were medical diagnoses of four types 
of physical pathology, IQ score, and a score 
of physical fitness capacity. Eighteen of the 
test scores were eliminated, either because of 
zero correlations or very high correlations 
with other variables, and the remaining 35 xX 
35 matrix factored. 

Seven factors emerged which are tentatively 
identified as: withdrawal, physical tension, 
needs for social approval or prestige, anxiety, 
dependency-passivity, affective lability, and 
pseudo-intellectual defenses. The factor struc- 
ture seemed unrelated to either the kind of 
test or the type of medical pathology. 

Presence of orthopedic pathology did not 
show a significant loading on any of the seven 
factors. Respiratory pathology present had 
significant loadings on not labile and not 
anxious. Cardiac pathology present showed a 
significant loading on intellectual defenses; 
neurological pathology present had high load- 
ings on withdrawal, physical tension, and de- 
pendency-passivity. 

The results of the analysis would tend to 
indicate a partial validation for some of the 
clinical interpretations made from selected 
projective test variables and to support the 
hypothesis that personality factors are not 
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associated with specific types of physical 
disease. 

Received June 9, 1958. 
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A number of studies have pointed to the 
impact of the “schizophrenogenic” parents, 
especially mothers (e.g., 3, 5, 8, 12, 16, 17, 
18, 19), and other studies are indicative of 
pathological configurations in Rorschach re- 
sults of relatives of schizophrenics (e.g., 13, 
14). Further examination of the proposition 
that pathology can be discovered in the 
Rorschach performances of mothers of schizo- 
phrenics is in order. Such mothers are ex- 
pected to show relatively severe manifesta- 
tions as compared with mothers of nonschizo- 
phrenic offspring. 

In 1950, Prout and White published one of 
the few studies in which there was a com- 
parison of reasonably well matched groups of 
mothers of schizophrenics and mothers of non- 
schizophrenics. Prout and White studied, by 
means of personal interview, the dominant 
orientations of all these mothers toward their 
sons in such areas as child-rearing attitudes 
and also recall of the various actual events 
and adjustments of the growing child. Addi- 
tionally, a Rorschach test was administered 
to each mother in an effort to learn some- 
thing about her personality pattern. Essen- 
tially, the findings of Prout and White were 
these: (a) the mothers of the schizophrenics 
seemed to be overprotective and overcon- 
trolling in contrast to the control mothers, 
(6) the experimental mothers had somewhat 
higher and more unrealistic levels of aspira- 


1We express our gratitude to C. T. Prout and 
Mary A. White, of the New York Hospital, West- 
chester Division, White Plains, New York. Their 
data and cooperation made the study possible. 


tion for their sons and subordinated the hap- 


‘piness of the child to such considerations, 


(c) yet, paradoxically, their demands on their 
sons were somewhat less explicit than the 
demands made by the control mothers, and 
(d) clinical assessments of the Rorschachs in- 
dicated that the experimental mothers mani- 
fested more pathological personality configu- 
rations. 

It is this last finding which leads to the 
effort to make a more particularistic state- 
ment of the difference in personality charac- 
teristics, based on the Rorschach perform- 
ances of these two groups of mothers. The 
Prout and White data can be utilized in a 
test of the hypothesis that the “level of per- 
ceptual organization” aspect of the complex 
pathology called schizophrenia is more evi- 
dent in the mothers of schizophrenics than 
among mothers of normals. 


Method 


The Prout and White sample consisted of 
50 women, half of them being the experi- 
mental group who were mothers of young 
adult males diagnosed as schizophrenic, and 
the other half consisted of a control group of 
mothers of normal sons. In general, Prout 
and White made the two groups comparable 
on several usual matching variables. How- 
ever, there were some differences in chrono- 
logical age and about one-third of the ex- 
perimental mothers had been born in for- 
eign countries. The Rorschach protocols were 
coded by White and sent to the present au- 
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thors who rated each protocol for “personal- 
ity-age-level.”’ 

“Personality-age-level” ratings are based on 
the viewpoint that the Rorschach perform- 
ance can be scored for maturity of personal- 
ity development by rating the protocol on a 
scale (a five-point scale in the present study) 
on which points are defined by the average 
perceptual organization levels of successive 
representative chronological age groups be- 
ginning with very young children and ex- 
tending to the adult group. In other words, 
it is assumed that the level of organization 
of Rorschach responses is associated with the 
level of organization of adjustment in life. 
Since the rating points are defined in terms 
of average Rorschach performances, the defi- 
nitions of rating points are limited by the 
adequacy of normative data. In the present 
study, norms relied upon included those of 
Halpern (6), Kay and Vorhaus (9), Klopfer 
and Margulies (10), McFate and Orr (11), 
and Cass and McReynolds (2). 

After each judge had rated every protocol 
(there were 25 experimental and 23 control 
records available), the key to the code was 
obtained and the statistical tests for differ- 
ences between the two groups of mothers 
were made. 


Results 


The judges were permitted any gradations 
between 1 and 5, or in other words were not 
required to use only the points 1, 2, 3, 4, or 
5. The product-moment correlation between 
the two sets of judges’ ratings is .83. A mean 
rating based on the two judges’ ratings for 
each protocol was adopted as the best avail- 
able score. On the five-point scale where 5 is 
the most immature and 1 is the most ma- 
ture, the mean rating for the experimental 
group is 2.89 and that for normals is 2.43. 
The difference between means yielded a ¢ 
of 3.08, p < .01. There is considerable over- 
lap of the groups, as is expected. Taking the 
median of the combined groups as a reference 
point, 10 mothers of schizophrenics and 14 
mothers of normals are above, and 15 moth- 
ers of schizophrenics and 9 mothers of nor- 
mals are below, the median. 

For each of the two sets of ratings, a chi- 
square test for relationship between group 
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and rating was made. The four rating cate- 
gories of the 2 x 4 chi-square table were the 
quartiles of the combined distribution of the 
48 ratings made by each judge. For one 
judge, the chi square was significant between 
the .02 and .01 levels. For the other judge, 
the significance level was between .05 and 
02. 
Discussion 

Since there is a statistically significant dif- 
ference between the mean age-level ratings 
for the experimental and control groups, it is 
concluded that among mothers of schizo- 
phrenic sons there is a disproportionate num- 
ber who manifest pathology. This pathology 
is tentatively defined as immaturity of per- 
ceptual organization. It is assumed that this 
immaturity is associated with detrimental de- 
velopmental conditions and is manifest in 
pathogenic conduct in parent-child relation- 
ships which has contributed significantly to 
the schizophrenic reactions of the sons. 

Since the distributions of the highly reli- 
able ratings show considerable overlap, it is 
assumed that the experimental mothers are 
only quantitatively deviant from the cultur- 
ally normal range of conduct. Several moth- 
ers of nonschizophrenic sons manifest severe 
pathology. Their less debilitating effect is ex- 
plained on the basis of assumed compensatory 
relationships of their sons with others. As 
some experimental mothers manifest mini- 
mal pathology, alternative hypotheses must 
be invoked to account for the development of 
schizophrenic reactions in these sons. 

The approach used in evaluating the Ror- 
schachs is apparently very similar to a sys- 
tem developed at Clark University and re- 
ported by Friedman (4), Siegel (18), Wyatt 
(20), Pena (15), and Hemmendinger (7). A 
similar system was used by Becker (1). Since 
the direct comparison of scoring one set of 
protocols by all of these methods was not 
carried out, no firm conclusion is justified, 
as yet, regarding the comparability of such 
scores. 


Summary 


Rorschach protocols of 48 women, 25 of 
whom were mothers of young adult male 
schizophrenics and 23 mothers of normal 
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sons, were scored for personality-age-level, 
ie., maturity of personality. 

Within the limitations of this study, the 
following conclusions are drawn: 

1. Personality-age-level Rorschach scoring 
can be done with good reliability. 

2. As a group, mothers of schizophrenic 
sons manifest more pathology than a group 
of mothers of normal sons. 


Received December 2, 1957. 
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There is a great deal of inconsistency in the 
Rorschach literature among statements about 
the incidence of Rorschach card rejection 
within the various nosological groups and 
about the determining factors proposed to ac- 
count for it. In Rorschach’s (6) discussion of 
the incidence of rejections he stated that nor- 
mal subjects almost never fail to give a re- 
sponse, while neurotics occasionally reject, 
and schizophrenics frequently reject the cards. 
He attributed the neurotic’s failure to respond 
to “inhibitions due to complexes.” Beck (1) 
considered rejections to result from “fearful 
needs for certainty,” and he regarded them as 
an indication of defensive behavior, the sub- 
ject thereby trying to shut off either affect or 
intellect. In the schizophrenic, however, he 
considered them as symptomatic of regression. 

Klopfer and Kelley (3) observed that there 
were differential rejection rates associated with 
the various cards. They stated that Card IX 
was the only one frequently rejected by nor- 
mal subjects, with Cards II and VI next in 
order of frequency. It was their belief that the 
rejection of Cards II, IV, VI, and IX has less 
pathological significance than rejection of the 
other cards, that neurotic subjects usually re- 
ject these cards, and that the rejection of more 
than four cards points toward a psychotic dis- 
order. They proposed that while the general 
significance of rejection is a blocking or re- 
sistance against the situation, it may spring 
from a variety of causes such as an intellectual 


1 From the VA Hospital, Northampton, Mass. The 
author is appreciative of the help in the preparation 
of this report given by the staff of the Clinical Psy- 
chology Service and by Morton Wiener of Clark Uni- 
versity and Maria Rickers-Ovsiankina of the Uni- 
versity of Connecticut. 

2Now at the Columbus Receiving Hospital for 
Children, Columbus 19, Ohio. 


or emotional incapacity to cope with the situa- 
tion or an aggressive negativism. 

Using the group Rorschach method with 
psychiatric patients to explore favorable prog- 
nostic indications, Scherer (7) found that 
Rorschach card rejection was inversely related 
to spontaneous remission, and that there were 
trends for patients who showed the least num- 
ber of rejections to be the ones most likely to 
improve from electroconvulsive therapy and 
lobotomy. He also found that two weeks after 
lobotomy, rejections decreased in both indi- 
vidual and group methods of administration 
(8). In the context of these studies, rejection 
was considered to signify withdrawal from the 
immediate environmental situation and a de- 
crease in the strength of ego boundaries. 

It is understandable that inconsistent opin- 
ions might arise because of the absence of 
relevant statistical data. However, recent 
studies reporting empirical data on rejection 
by psychiatric groups show considerable agree- 
ment and elucidate the actual incidence of re- 
jections. In Mensh and Matarazzo’s study (4) 
of 201 hospital patients, it was found that 
19% of their psychotics, 24% of the neu- 
rotics, and 37% of the organics rejected one or 
more cards. These values were not signifi- 
cantly different from each other according to 
the chi square test. Rejections occurred pri- 
marily in response to Cards IX, VI, VII, X, 
and IV, which provoked 83% of all rejections. 
In a cross-validation study, Sisson, Taulbee, 
and Gaston (9) found that within their sub- 
ject groups, rejections were elicited from 
33.3% of the neurotics, 32.0% of the schizo- 
phrenics, 17.4% of the normal adults, 10.0% 
of the normal children, and 22.7% of the 
mentally defective children. These differences 
were significant for comparisons within the 
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adult groups and the juvenile groups, with the 
exception that the percentages of rejections by 
the neurotics and the schizophrenics were not 
significantly different from each other. When 
all groups were matched for age and number 
of responses, the only significant difference 
with respect to the rejection data was the 
more frequent rejection of Card X by the 
schizophrenics than by the normals. They 
concluded that although Rorschach card re- 
jection has little or no diagnostic significance 
in terms of the common nosological categories, 
it might be related to personality variables 
which cut across the broad diagnostic clas- 
sifications. It was to explore this possibility 
that the present study was designed. 


Method 


Rorschachs of 109 psychiatric patients at 
the Northampton Veterans Administration 
Hospital for whom concurrent MMPI data 
were also available were randomly selected 
from the files for use in this study. Rejection 
was defined as the failure to offer a scorable 
response to any Rorschach card during the 
performance proper, and the specific cards 
eliciting a rejection were recorded for each 
subject. The personality variables investigated 
as determinants of rejection were defensive- 
ness, psychasthenia, depression, pathology, 
and psychiatric diagnosis. The first three traits 
were suggested as promising variables in view 
of their advocacy by other investigators as de- 
terminants of item omission on the MMPI 


Table 1 


Frequencies and Percentages of Neurotics and 
Psychotics Rejecting Rorschach Cards 


Neurotics Psychotics 
Number (N=37) (N=59) 
rejected No. % No. % 
0 21 56.8 34 57.6 
1 5 13.5 12 20.3 
2 4 10.8 7 11.9 
3 4 10.8 3 5.1 
q 1 2.7 1 1.7 
5 2 5.4 2 3.4 
6-10 0 0 0 0 


Note.—The Kolmogorov-Smirnov two-sample test shows no 
significant difference at the .05 level between the two percent- 
age distributions. 
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Table 2 


Frequencies and Percentages of Neurotics and 
Psychotics Rejecting Individual 
Rorschach Cards 


Neurotics 
(N =37) 


Psychotics 
(N=59) 
Rorschach 
Card No. % No. % No. % 


I 0 oO 3 5.1 3. 
II 6 16.2 3 5.1 > DA 
Il 2 SA 1 1.7 3 3A 
IV 3. Bl 5 8.5 8 83 
4 6.8 $2 
VI $ 8 13.6 1i «(11:5 
vil 10 27.0 6 10.2 16 16,7 

Vill 2 5A 3 5.1 
IX 7 189 16 16.7 


Note.—Critical ratio tests for the significance of the differ- 
ence between percentages showed a significant difference at the 
.05 level only for Card VII, the CR being 2.06. 


(10), a type of test behavior presumed to be 
the “objective” test counterpart of Rorschach 
card rejection. These traits were represented 
by the following scales of the MMPI: “Can- 
not Say,” K, L, Pt, and D. The F scale was 
taken as the measure of pathology, and psy- 
chiatric diagnosis was either neurosis or psy- 
chosis (predominantly schizophrenia) previ- 
ously established by a diagnostic conference 
for a subgroup of 96 patients. All MMPI 
scales were dichotomized at their respective 
medians, and Rorschach rejection was di- 
chotomized on the basis of its absence or 
presence. A series of two by two contingency 
tables was drawn up, and chi square tests 
were computed separately for each measure. 


Results 


None of the chi square tests for the relation- 
ship between Rorschach card rejection and the 
various personality traits reached an accept- 
able level of statistical significance, nor were 
any trends even apparent. Of the neurotic sub- 
sample (N = 37), 43.2% gave one or more 
rejections, and of the psychotics (NV = 59), 
42.4% did so. Table 1 shows the frequencies 
and percentages of subjects from each group 
rejecting specific numbers of cards. The Kol- 
mogorov-Smirnov two-sample test for differ- 
ences between the two percentage distributions 
failed to reach the .05 level of significance. 


Rorschach Card Rejection by Psychiatric Patients 


Table 2 shows the frequencies and per- 
centages of neurotics and psychotics rejecting 
each individual Rorschach card. Tests of the 
significance of percentage differences for each 
card showed that only for Card VII was the 
difference significant at the .05 level, the 
critical ratio being 2.03. The significance of 
this finding vanishes, however, since one out 
of ten computed statistics attaining ‘the .05 
level could result from chance. 

Discussion 

This attempt to identify personality traits 
associated with the tendency to reject Ror- 
schach cards failed to find significant rela- 
tionships. Although the omission of MMPI 
items was thought to be the “objective” test 
counterpart of Rorschach rejections, this be- 
lief was unsupported by the data. In agree- 
ment with other studies (4, 9), it was found 
that nosological groups (neurotics and psy- 
chotics) rejected essentially the same number 
of cards with comparable frequencies for 
given numbers of cards and for specific cards. 
The very close consistency with the findings 
of the other empirical studies with respect to 
the differential rejection rates evoked by par- 
ticular cards is illustrated by Table 3. This 
table compares the independently obtained 
orders of the five most frequently rejected 
cards, arranged from highest to lowest in rate 
of rejection. 

It may be seen that among the five most 
frequently rejected cards, Nos. IX, VI, X, and 
VII are common to the findings of all three 
investigators. It is of considerable interest to 
note that Sisson’s normal group is also in- 
cluded in this communality. Closely related 
to these stable differential rejection rates are 
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the reported observations of initial reaction 
times. Phillips and Smith (5), presenting data 
from 59 normal males ranging in age from 20 
to 40 years, showed the following order of the 
five cards eliciting the longest response latency: 
VI, IX, II, X, and IV. Similarly, Beck et al. 
(2) showed the order which they obtained to 
be: IX, VI, X, IV, and II. The overlap be- 
tween the five most rejected cards and the five 
cards eliciting the longest response latency is 
striking enough to suggest a common deter- 
mining factor. Although the identification of 
this factor cannot be established on the basis 
of the available data, a likely possibility would 
be that specific characteristics of these par- 
ticular cards make for difficulty in deriving 
concepts from them. It is a matter of conjec- 
ture, of course, whether these characteristics 
pertain to the structural properties or the emo- 
tional evocativeness of the cards. While it may 
be seen that the degrees of difficulty in re- 
sponding to the cards are the same for differ- 
ent nosological groups, comparison with nor- 
mative data suggests that psychiatric patients 
experience more difficulty than do normals. 


Summary 


The Rorschach literature contains consid- 
erable inconsistency among statements con- 
cerning the incidence and determinants of 
Rorschach card rejection within the various 
nosological groups. While empirical studies 
have elucidated the actual incidence of rejec- 
tion, there has been no investigation of pos- 
sible personality variables underlying it. This 
study sought to explore the question using 
measures of personality variables derived from 
the MMPI and the variable of psychiatric 
diagnosis in a group of 109 hospital patients 


Table 3 


Study Subjects 


Order from highest to lowest 


2nd 3rd 4th Sth 


Tamkin 

Sisson (9) 
Sisson (9) 
Mensh (4) 


200 neurotics and schiz. 
190 normals 


96 neurotics and psychotics x 


174 neurotics and psychotics xX 


VI II 
vil Vv 
xX 
Iv Vil 


Comparison of the Orders of the Five Most Frequently Rejected Cards Derived from Empirical Studies 
XVI 
IX VI 
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containing subsamples of 37 neurotics and 59 
psychotics who were predominantly schizo- 
phrenic. No relationships were found between 
Rorschach card rejection and the variables of 
defensiveness, psychasthenia, depression, pa- 
thology, and diagnosis. The five most fre- 
quently rejected cards were shown to be in 
close agreement with the findings of other 
empirical studies. They were found to overlap 
to a large extent the cards evoking the longest 
response latencies, a phenomenon presumably 
a function of the relative difficulties of the 
cards in permitting the derivation of concepts. 
The available data did not permit a decision 
about whether the difficulty was derived from 
the structural properties or the emotional 
evocativeness of the cards. It was shown, how- 
ever, that the degreés of difficulty in respond- 
ing to the cards are the same for different 
nosological groups, and comparison with nor- 
mative data suggested that psychiatric pa- 
tients experience more difficulty than do 
normals. 

Received November 11, 1957. 
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The validity of a test or of a personality in- 
ventory is frequently defined as the extent to 
which it actually measures what it purports to 
measure. However, criterion measures of va- 
lidity in which data are collected in actual be- 
havioral situations are generally not available. 
With the exception of a recent study by 
Bernadin and Jessor (2), the correlations ob- 
tained thus far are between the Edwards 
Personal Preference Schedule and self-ratings 
or ratings by peers, and other personality in- 
ventories. 

In the study by Bernadin and Jessor, the 
construct of Dependency was defined in order 
to predict the behavior in Asch’s experimental 
situation from behavior on the Autonomy and 
Deference subscales of the EPPS. In the 
study, the authors employed Asch’s technique 
of measuring group behavior as the criterion 
for one of the propositions in the construct, 
Dependency. They suggested that the reason 
for the failure to find a correlation between 
Asch’s behavior and behavior on the EPPS 
was due to the fact that the subjects were 
placed in a concrete reality situation. 


In the present study, in order to relate test . 


behavior with the criterion, the term con- 
formity behavior has been defined as the need 
not to be different, to follow the opinions and 
suggestions of others, to conform to the group. 
The conformist has less ability to tolerate his 
own impulses, less originality, and a greater 
tendency to follow external and socially ap- 
proved values. 

According to Edwards, the Personal Prefer- 
ence Schedule is a measure of an individual’s 
needs or motives, which college students and 
adults seek to satisfy in the daily conduct of 


1 Now at Creighton University Medical School. 


their lives. The needs measured by the EPPS 
were drawn from the studies of Murray (5). 
Edwards’ description of the Autonomy and 
Deference subscales of the EPPS is as follows: 


Autonomy: To be able to come and go as desired, 
to say what one thinks about things, to be inde- 
pendent of others in making decisions, to feel free 
to do what one wants, to do things that are uncon- 
ventional, to avoid situations where one is expected 
to conform, to do things without regard to what 
others may think, to criticize those in positions of 
authority, to avoid responsibilities and obligations. 

Deference: To get suggestions from others, to find 
out what others think, to follow instructions and do 
what is expected, to praise others, to tell others that 
they have done a good job, to accept the leadership 
of others, to read about great men, to conform to 
custom and avoid the unconventional, to let others 
make decisions (4, p. 5). 


The Autonomy subscale purportedly meas- 
ures the same personality variable, conformity 
behavior, as defined above. Furthermore, there 
is indication that the Deference subscale is 
measuring a trait which is in opposition to the 
trait measured by the Autonomy subscale. 


Procedure 


The purpose of this study was to determine 
the validity of the Autonomy and Deference 
subscales of the EPPS by correlation with a 
criterion measure of conformity behavior. The 
method of measuring conformity behavior is 
a modification of Asch’s basic group situation 
developed for measuring conformity behavior. 


Asch’s (1) technique of measuring conformity be-: 
havior requires the subjects (Ss) to report which of 
three parallel lines is equal in length to a standard 
line. One naive S is placed in a group with three 
other Ss who have been previously instructed to give 
incorrect answers. Asch found that an individual is 
not likely to deviate from the group; he is more 
likely to go against the evidence of his own senses 
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when the difference is most difficult to perceive, when 
the group is unanimous against him, and when those 
in opposition increase in numbers. 

The present study employed Asch’s method, 
with the exception that four naive Ss were 
used, as in Crutchfield’s (3) modification. The 
present study did not employ Crutchfield’s 
electrical apparatus to record responses. 

In the present study, 50 volunteer Ss, 31 
males and 19 females, were drawn from four 
sections of an elementary psychology course. 
An attempt was made to control prestige ef- 
fects by selecting not more than two Ss from 
a class to participate in each group of four 
Ss. Four Ss were instructed to enter the ex- 
perimental room and take seats randomly 
among four partitioned chairs. The only per- 
son they could see was the experimenter, who 
was seated about 13 ft. in front of them at a 
small table. At each S’s desk was an instruc- 
tion sheet and three response cards numbered 
1, 2, and 3. In each of the 18 trials, the ex- 
perimenter displayed two large cards: the 
card on the left showed the standard line, and 
the card on the right showed three parallel 
comparison lines numbered consecutively 1, 
2, and 3. Then the four Ss made their judg- 
ments to the standard by raising a card num- 
bered to correspond with the selected com- 
parison line. 

The illusion of the procedure rested on 
each S’s belief that he was the fourth and final 
person to make a judgment. When the ex- 
perimenter reported the judgments of the first 
three Ss, he was creating a false group ma- 
jority. Actually, no one had reported. The 
pressure to conform to the group was artifi- 
cially produced. False majority judgments 
were reported in 9 of the 18 trials. Whether 
the experimenter reported a true or false 
majority response, it was unanimous; the first 
three Ss were reported to have given the same 
judgment. 

The differences to be discriminated were 
considerable; most unequal comparison lines 
were clearly longer or shorter than the stand- 
ard. The comparison lines differed from the 
standard by varying amounts between .5 and 
1.75 inches; no attempt was made to maintain 
a constant ratio between them. On successive 
trials the equal line appeared randomly in 
different positions. 


Darrell Gisvold 


After being subjected to the group con- 
formity situation, the group of 50 students 
was given the Edwards Personal Preference 
Schedule within a two-week period. 


Results 


A comparison of the data obtained from 
the students sampled in this study was made 
with both the results obtained by Asch in 
his measurement of group conformity and 
that of Edwards’ normative sample of the 
Personal Preference Schedule. 

A control study was made in order to de- 
termine whether the Ss’ conformity errors 
were due to indiscriminable line lengths. In 
the control group of 10 Ss, an average of .8 
errors were made in the 9 critical trials. The 
expected frequency of error on a chance basis 
should have been 67%. The frequency of 
error found was 9%. For the sample of 50 Ss 
used in this study, it was found that there 
were 32% conformity errors. This approxi- 
mates both the results of Asch and Crutch- 
field of about 30% conformity errors. 

An analysis of the difference between the 
subscale variances of the present sample and 
Edwards’ Normative sample was made by em- 
ploying the variance ratio, F. In all cases, the 
F ratios computed were not large enough to 
reject the null hypothesis at the .05 level of 
confidence. Further analysis was made by 
testing the subscale mean difference of the 
two samples. A ¢ test was employed to test the 
mean differences. The only significant differ- 
ences found between the two population sam- 
ples was on the Succorance subscale and the 
consistency scores. The former was significant 
at the .01 level of confidence, and the latter 
was significant at the .05 level in a direction 
toward greater consistency of the present 
sample. 

The product-moment correlation between 
the scores on the Autonomy subscale and 
conformity scores from the group conformity 
situation was found to be —.54, significant 
at the .02 level of confidence. Therefore, there 
is a high degree of assurance that the Au- 
tonomy subscale is measuring the need for 
Autonomy as described by Edwards. 

The product-moment correlation between 
the scores on the Deference subscale and con- 
formity scores from the group conformity 


= 


Validity of the Autonomy and Deference Scales of the EPPS 


situation was found to be .17; this value was 
not large enough to reject the null hypothesis 
at the .05 level of confidence. The correlation 
ratio eta was computed in order to test for 
the possible existence of a curvilinear rela- 
tionship between regression lines, and no sig- 
nificant difference was found. 


Summary 

The aim of this study was to determine the 
empirical validity of the Autonomy and De- 
ference subscales of the EPPS. A group situa- 
tion developed by Asch to measure conform- 
ity behavior was used as the criterion. 

The correlation between the conformity 
scores and the scores on the Autonomy sub- 
scale was found to be —.54, significant at 
the .02 level of confidence. Therefore, the 
Autonomy subscale of the EPPS as described 
by Edwards is empirically valid with respect 
to the criterion of conformity behavior as de- 
veloped for this study. 
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The correlation between the conformity 
scores and the scores on the Deference sub- 
scale was not significant. On the basis of the 
criterion measure, conformity behavior, the 
Deference subscale does not predict an indi- 
vidual’s conformity behavior. These findings 
indicate that a person who has a need for 
Deference does not necessarily exhibit an 
equal need to conform to group situations. 


Received December 12, 1957. 


References 


1. Asch, S. E. Social psychology. New York: Pren- 
tice-Hall, 1952. 

2. Bernadin, A. C., & Jessor, R. A construct vali- 
dation of the Edwards Personal Preference 
Schedule with respect to dependency. J. con- 
sult. Psychol., 1957, 21, 63-67. 

3. Crutchfield, R. S. Conformity and _ character. 
Amer. Psychologist, 1955, 10, 191-198. 

4. Edwards, A. L. Edwards Personal Preference 
Schedule, Manual. New York: Psychological 
Corp., 1954. 

5. Murray, H. A. Exploration in personality. New 
York: Oxford Univer. Press, 1938. 


1 
4 


Journal of Consulting Psychology 
Vol. 22, No. 6, 1958 


Relationship Between Achievement Motivation 
Scores and Manifest Anxiety Scores 


Donald H. Kausler and E. Philip Trapp 
University of Arkansas 


A number of studies (e.g., 2, 4, 6, 7) have 
indicated that task performance may be a 
function of achievement motivation. In these 
studies, however, no effort was made to con- 
trol for other motivational or drive variables. 
Apparently, the assumption was taken that 
the other drive variables operate randomly in 
groups differentiated on measures of achieve- 
ment motivation. 

This assumption is questioned by the pres- 
ent authors. Measures of achievement, which 
are determined by test scores, may, in effect, 
be the product of a number of motivational 
factors, achievement motivation being but 
one. For example, performance on an achieve- 
ment test possibly could be related to the 
anxiety drive level of subjects. Groups formed 
on the basis of achievement test scores would 
then also represent groups differentiated with 
respect to anxiety drive level. 

The present empirical study is designed to 
investigate this conceivable relationship be- 
tween achievement motivation and manifest 
anxiety drive. More specifically, the study in- 
volves a comparison between two sets of scores 
—one set on an achievement test and the 
other on a test of manifest anxiety drive. The 
null hypothesis as involved in this study 
states that there is no relationship between 
level of achievement motivation and level of 
manifest anxiety drive. 


Method 


Subjects. One hundred and three male stu- 
dents in general and abnormal psychology 
courses at the University of Arkansas served 
as Ss. Only male Ss were used because of the 
strong possibility, as suggested by Atkinson 
(1), of a pronounced sex difference in achieve- 


ment motivation. All testing was conducted 
during regular iss periods. 

Instruments. The complex motivation test 
developed by French (4, 5) served as the 
achievement motivation measure. The test 
was scored for achievement motivation by the 
scoring system devised by French. Close 
agreement was found between test scores for 
two scorers working independently. 

The Taylor Manifest Anxiety Scale (8) 
served as the measure of anxiety drive level. 
The standard administration and scoring pro- 
cedure outlined by Taylor were followed. 

Procedure. The achievement motivation 
test, disguised as a “Test of Insight,” was 
administered early in the semester. The Ss 
were instructed that the study was part of a 
research project designed to develop a new 
psychological test. Approximately one month 
later, the anxiety scale was administered. This 
serie, titled “Biographical Inventory,” was 
described to the Ss as a source of biographical 
information for comparing students at Arkan- 
sas with students elsewhere. Apparently, the 
Ss did not suspect a relationship between the 
two parts of the study. 


Results 


A correlation coefficient was computed be- 
tween the scores received on the two tests. 
Since the scores on both the French test and 
the Taylor test are ordinal values, a non- 
parametric statistic, the Spearman rank-dif- 
ference rho, was used. A rho of —.20 was ob- 
tained, which is significant at the .05 level 
of confidence (¢ = 2.05). Thus, when the en- 
tire range of scores is considered, there is a 
small, but significant, negative correlation be- 
tween achievement and anxiety test scores. 
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Achievement Motivation and Manifest Anxiety 


Table 1 


Achievement Motivation and Anxiety Drive Level: 
Observed and Expected Frequencies 


Achievement motivation 
Anxiety 
drive 


High Low 


5 5 
High 


Low 10 


Note.—x? = 6.17 (p < .02). 


Since most studies employing achievement 
motivation and anxiety tests compare Ss only 
at the two extremes of the scoring range, an 
additional analysis of the data was performed. 
Four groups consisting of the top and bottom 
quartiles on both tests were determined (N 
= 26 in each quartile). A 2 X 2 contingency 
table was then established and tested by chi 
square. In the table, the two groups were high 
and low achievement motivation, and the two 
classifications were high and low anxiety 
drive The results of this analysis are given 
in Table 1. 

An inspection of Table 1 shows that high 
achievement motivation scores are related to 
low anxiety scores and that low achievement 
motivation scores are related to high anxiety 
scores. The over-all y? is 6.17, which is sig- 
nificant at the .02 level of confidence. When 
only extreme groups are considered, the nega- 
tive relationship between the two variables 
becomes more pronounced. This lends addi- 
tional support for the rejection of the null hy- 
pothesis of no relationship between the two 
sets of scores. 


Discussion 


Research, notably from the Iowa group (3, 
9), has demonstrated that performance level 
on various tasks may be affected by anxiety 
drive level. Consequently, the relationship be- 
tween achievement motivation and anxiety 
drive should definitely be taken into account 
in studies attempting to relate achievement 
motivation to performance. Otherwise, the ef- 
fect of different levels of achievement motiva- 
tion on performance might be contaminated 
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by corresponding systematic differences in 
anxiety drive level. In other words, an at- 
tempt should be made to partial out differ- 
ences in anxiety drive level when Ss are differ- 
entiated on the basis of achievement test 
scores. 

Two alternative interpretations are offered 
for the correlation between achievement and 
anxiety test scores. The first suggests that 
both tests are measures of the same general 
drive state. In this case, the correlation simply 
reflects the amount of overlap. The second 
interpretation suggests that the two tests are 
measures of two different but related drive 
components. In this case, the correlation simply 
reflects the degree of interaction. The resolu- 
tion of these opposing explanations is funda- 
mentally dependent upon further validity 
studies on both achievement motivation and 
manifest anxiety tests. Current evidence is in 
the direction of favoring the second interpreta- 
tion, but the lack of sufficient validation evi- 
dence indicates the need for further investiga- 
tion into the nature of the drive components 
measured by tests of human motivation. 


Summary 


A comparison was made between the scores 
received on the French Achievement Motiva- 
tion Test and the Taylor Manifest Anxiety 
Scale by 103 undergraduate male students. A 
rho of —.20 was obtained (p < .05). In ad- 
dition, a chi square analysis of the data was 
performed. In the resultant four-fold con- 
tingency table, the two groups consisted of the 
top and bottom quartiles on the achievement 
motivation test, and the two classifications 
consisted of the top and bottom quartiles on 
the anxiety test. The over-all x? was 6.17 
(p < .02). The results clearly reflect a sig- 
nificantly negative relationship between the 
two sets of scores, and indicate the need to 
partial out the anxiety drive component in 
studies designed to measure the effects of 
achievement motivation. The question of 
whether the Taylor scale and the French test 
are actually measuring two aspects of the 
same drive state or aspects of two interrelated 
drive states is raised. 


Received December 23, 1957. 
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In recent years, there has been consider- 
able effort devoted to investigating the char- 
acteristics of the achievement motive (n-Ach). 
For this purpose, three tests have been pre- 
sented which are suitable for research involv- 
ing large groups: the n-Ach scale of the Ed- 
wards Personal Preference Scale (2), French’s 
Test of Insight (3, 4), and McClelland’s n- 
Ach (8). The latter two tests are projective 
tests of n-Ach, while the Personal Preference 
Schedule is an objective, forced-choice test 
which provides, among others, a scale of 
achievement. In a recent study, Bendig (1) 
reported a correlation of .11 between McClel- 
land’s projective measure of n-Ach and Ed- 
wards’ objective n-Ach scale. The obtained 
correlation, with NV = 244, was not significant 
at the .05 level. The present study is an at- 
tempt to verify Bendig’s findings and, at the 
same time, investigate the relationship be- 
tween two projective tests of n-Ach, McClel- 
land’s and French’s. 


Procedure 


The Ss in this study were 298 members of 
an entering class of the Air Force Academy, 


1 This investigation was carried out under the Air 
Force Personnel and Training Research Center in 
support of Project No. 7719. Permission is granted 
for reproduction, translation, publication, and use or 
disposal in whole or in part by or for the United 
States Government. 

2 With the Personnel Laboratory, WADC, when 
this study was conducted. 

8 The authors would like to express appreciation 
to Raymond E. Christal, Eli S. Flyer, John D. 
Krumboltz, and Ernest C. Tupes, for their assistance 
in planning the study and reading portions of the 
manuscript. 


Denver, Colorado. The Ss were tested in July, 
1957, within one or two days after reporting 
to the academy. The three tests were ad- 
ministered within a period of two days as 
part of a larger testing program. 


Pictures A, B, D, and E of the McClelland n-Ach 
test (8, p. 375) were used in this study. All stories 
were independently scored by two psychologists after 
practice with stories in the manual. Before scoring 
the papers obtained from the present sample, the 
two scorers compared their scores with those of the 
authors of the McClelland scoring system (8, pp. 
335-374). Correlations in the .80’s were obtained for 
the scorers when compared with the manual. This is 
similar to the level of agreement obtained by other 
users of this scoring system (6). All 298 papers were 
scored by both raters. In addition to the independ- 
ent scores obtained by the two psychologists, a 
“joint score” was obtained. This score was obtained 
as follows: Differences of three points or less in to- 
tal score were averaged; larger differences were 
settled by rescoring the stories jointly. Interscorer 
reliability before the scoring conference was .714 for 
the entire sample. 

Form I of French’s Test of Insight (4, p. 5) was 
used in this study. The scoring system for this test 
is similar in many respects to McClelland’s system. 
This test was scored after the raters had obtained 
considerable practice in scoring the McClelland n- 
Ach test. The raters independently scored two intact 
testing groups (N = 77). Rater reliability for these 
cases was .70. Each scorer then scored approximately 
half of the remaining cases. The “joint score” in this 
instance was obtained for the first 77 cases, using the 
same procedure as described above for the McClel- 
land joint score. 


Results and Discussion 


It can be seen from Table 1 that no sig- 
nificant relationship was obtained among the 
three measures of n-Ach. The results bear out 
Bendig’s finding of a lack of relationship be- 
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Table 1 


Intercorrelations of the Measures of 
Need Achievement 


Tests 


French- French- 
McClelland Edwards 


McClelland- 
Edwards 


— .024 
(172) 


105 
(172) 


—.018 
(298) 


047 
(203) 


082 
(207) 


010 
(298) 


— .067 
(77) 


019 
(77) 


003 
(298) 


Note.—Numbers in parentheses refer to size of sample. 


tween an objective and projective measure of 
n-Ach. In addition, the present study failed 
to find a significant relationship between two 
projective measures of n-Ach. In view of the 
fairly high interscorer reliabilities for tests of 
this nature, one would expect, if they are 
actually measuring a “general need for 
achievement,” that a significant relationship 
would exist between them. 

There are no published reports of the test- 
retest reliability of the Test of Insight. How- 
ever, in view of the extremely low test-retest 
reliabilities reported for the McClelland n- 
Ach test for one-week (7) and nine-week (6) 
intervals, it-is possible that lack of stability 
is characteristic of projective tests of need 
achievement. Thus, the lack of relationship 
of these measures with each other may be due 
to a lack of reliability. Another possibility, of 
course, is that some or all of these measures 
lack validity. 

It should be pointed out that, although this 
study indicates that these measures are unre- 
lated, their “validity” was not investigated. 
It is possible that the three scores are meas- 
ures of three independent but real traits 


whose only similarity is the names assigned 
them. This would indicate that some redefin- 
ing is necessary. At any rate, it should be 
pointed out, as others (5) have done, that it 
is dangerous to use interchangeably traits 
that are similarly defined but measured by 
different instruments. 


Summary 


The problem under investigation concerned 
relationships existing between three measures 
of achievement motivation: the n-Ach scale of 
the Edwards Personal Preference Schedule, 
McClelland’s test, and French’s Test of In- 
sight. Fairly high interscorer reliabilities were 
obtained for the two projective tests of 
achievement. However, no significant relation- 
ships were found for the three measures pur- 
porting to measure achievement motivation. 


Received December 16, 1957. 


References 


1. Bendig, A. W. Manifest anxiety and projective 
and objective measures of need achievement. 
J. consult. Psychol., 1957, 21, 354. 

. Edwards, A. L. Manual for Edwards’ Personal 
Preference Schedule. New York: Psychologi- 
cal Corp., 1954. 

. French, Elizabeth G. Some characteristics of 
achievement motivation. J. exp. Psychol, 
1955, 50, 232-236. 

. French, Elizabeth G. Development of a measure 
of complex motivation. USAF Pers. Train. 
Res. Cent., Res. Rep., 1956, No. AFPTRC- 
TN-56-48. 

. Iscoe, I, & Lucier, O. A comparison of the Re- 
vised Allport-Vernon Scale of Values (1951) 
and the Kuder Preference Record (Personal) . 
J. appl. Psychol., 1953, 37, 195-196. 

. Krumboltz, J. D., & Farquhar, W. W. Reliability 
and validity of the n-achievement test. J. con- 
sult. Psychol., 1957, 21, 226-228. 

. McClelland, D. C. (Ed.) Studies in motivation. 
New York: Appleton-Century-Crofts, 1955. 

. McClelland, D. C., Atkinson, J. W., Clark, R. A., 
& Lowell, E. L. The achievement motive. New 
York: Appleton-Century-Crofts, 1953. 


452 


Journal of Consulting Psychology 
Vol. 22, No. 6, 1958 


Differentiation of Diagnostic Groups by 
Individual MMPI Scales’ 


Albert Rosen 
University of Maryland 


A multiscaled instrument such as the MMPI 
is generally interpreted in terms of configura- 
tions of scale patterns and elevations. Infor- 
mation concerning the concurrent validity of 
individual scales can nevertheless be of con- 
siderable value. Besides indicating the degree 
of differentiation possible between diagnostic 
groups by means of individual scales, such in- 
formation can also contribute to the enrich- 
ment of configural kinds of interpretations in 
the psychiatric setting. Of even broader sig- 
nificance is the fact that data on individual 
scale discrimination can add to our total com- 
prehension of the meanings inherent in a spe- 
cific scale and therefore are apt to suggest 
other significant relationships and inferences. 

Although the present study was part of a 
larger project and not planned as a replica- 
tion, the findings are of special interest be- 
cause of another report that virtually no 
differentiation ‘between psychiatric groups is 
possible using individual MMPI scales. Rubin 
(6, 7) has reported that the MMPI cannot 
significantly differentiate between diagnostic 
groups in a VA psychiatric hospital. He se- 
lected at random 93 MMPI records of pa- 
tients, the total group consisting of 8 chronic 
alcoholics without psychosis, 24 psychopaths, 
33 schizophrenics, and 28 psychoneurotics. 
Using the analysis of variance method, he 
found that only one clinical scale, Sc, differ- 
entiated in some manner between the four 
diagnostic groups. This paper presents addi- 
tional data on the discriminating power of the 


1 This research was aone 1n the Psychiatry Service 
of the Minneapolis VA Hospital. The investigator is 
indebted to Leonard I. Schneider for suggestions 
leading to improved organization and clarity of the 
report. 


individual MMPI scales, as well as some pos- 
sible explanations of Rubin’s negative results. 


Procedure 


Five groups from the psychiatry service of 
the Minneapolis VA Hospital were chosen 
which were composed of 307 patients diag- 
nosed according to the Veterans Administra- 
tion psychiatric nomenclature as follows: (a) 
anxiety reaction (anxiety state)—83 cases; 
(6) conversion reaction (conversion hysteria) 
—49 cases; (c) depressive reaction (neurotic 
depression)—36 cases; (d) somatization re- 
action (psychophysiological reaction, psycho- 
somatic reaction, organ neurosis)—-39 cases; 
and (e) schizophrenic reaction, paranoid type 
—100 cases. 

The few published studies providing objec- 
tive data for the evaluation of diagnostic reli- 
ability have emphasized that there is a high 
degree of inconsistency in psychiatric diag- 
nosis (2, 3, 5). If diagnostic unreliability does 
prevail, psychological research based upon a 
diagnostic criterion is apt to be of limited 
value, for patients classified within any one 
diagnostic group will be quite heterogeneous, 
and there will be considerable overlapping be- 
tween groups. 

Because of the importance to the present 
study of obtaining homogeneous diagnostic 
groups, only so-called “pure” cases were se- 
lected for each sample. Thus, a case was ex- 
cluded if there was evidence that the patient 
had been given (a) two different diagnoses 
upon successive admissions, or (5) a diag- 
nosis containing a statement of overlapping 
trends, as for example, “anxiety reaction with 
paranoid and schizoid trends.” There was, 
however, one exception to this rule. Inasmuch 
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Fig. 1. Mean T scores for three diagnostic groups on each MMPI scale. 


as a large number of anxiety reaction cases 
suffer from depression, it was not feasible to 
exclude cases of “anxiety reaction with de- 
pression.” 

The procedure in selecting patients for the 
five diagnostic groups was as follows: 


1. From a file listing the final diagnoses of all 
patients who had been tested, the names were se- 
lected of “pure” cases who had been diagnosed in 
one of the five categories listed above.? 

2. For the purpose of increasing the homogeneity 
of each diagnostic group, there were included in the 
study only male, white veterans who had been tested 
within 15 days after admission and who had not re- 
ceived insulin or electroshock therapy prior to the 
administration of the MMPI. The median number of 
days between admission and MMPI administration 
for each of the five final samples was between 3 
and 4. 

3. The MMPI records of the patients selected were 
obtained from their test folders, and a few addi- 
tional cases were removed on the basis of extremely 
high L, F, K, and ? scores. In the final samples, the 
median number of ? items for each of the five diag- 
nostic groups was 2-4, Qs; was 6-14, and only 15 
records contained more than 30 ? items. 


Results 


The following procedure was used to test 
the significance of the differences in means 
of each MMPI scale for the five diagnostic 
groups: 

2 An exception was the group of patients diagnosed 
paranoid schizophrenia. Because of their abundance, 


100 cases were selected from those most recently dis- 
charged. 


1. The Bartlett test (4, p. 97) was applied and 
heterogeneity of variances (p<.05) of the five 
groups was found for six of the scales: F, K, Pd + 
AK, Pa, Sc + 1K, and Si. 

2. For each of these six scales, critical ratios were 
calculated for the 10 possible differences between 
means. 

3. For each of the other seven scales with homo- 
geneous variances, an analysis of variance test was 
made. 


The mean scores of the diagnostic groups 
for each MMPI scale are shown in Fig. 1 and 
listed in Table 1. These T scores are rounded 
numbers because all calculations were made 
with raw score means and variances. The F 
ratio is also recorded in Table 1 for the seven 
scales for which this statistic was computed. 
Since the curves for the Depression and Anx- 
iety groups and for the Somatization and Con- 
version groups are almost identical, the me- 
dian profiles for only three groups are shown 
in Fig. 1 in order to avoid a confusion of 
lines. On seven of the scales there is an or- 
derly hierarchy of mean scores, with the 
Paranoid Schizophrenia group being highest, 
and the Depression, Anxiety, Somatization, 
and Conversion groups following in order. 
These seven scales are F, Pd + 4K, Mf, Pa, 
Pt + 1K, Sc + 1K, and Si. On the D scale, 
this order for the five groups is the same ex- 
cept for a reversal of the Depression and 
Paranoid Schizophrenia groups. In the profiles 
of the Somatization and Conversion groups, 
the Hs + .5K, D, and Hy scores are most 
elevated with D slightly lower than the other 
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two. This is the so-called “V profile,” fre- scale scores for the five diagnostic groups, 61 
quently noted for individual cases diagnosed re significant at or beyond the .05 level, as 
as conversion or somatization reaction. listed in Table 2. Eleven such differences 

Among the 130 possible differences in mean would be expected by chance if the statisti- 


Table 1 
Mean T Scores of Five Diagnostic Groups on Each MMPI Scale 


Diagnostic groups 


Paranoid 
schizo- Somatiza- 
phrenia Depression Anxiety tion 
(N = 100) (N = 36) (N = 83) (N = 39) (N = 49) 


50 52 
70 54 
51 56 
71 79 
81 71 
70 72 
74 59 
64 55 
72 57 
81 66 
87 62 
63 56 
64 53 


* Significant at .05 level. 
** Significant at .001 level. 


Table 2 
Diagnostic Groups Significantly Different in Mean Scores on Each MMPI Scale 


Diagnostic groups 


Paranoid 

schizo- 
MMPI phrenia Depression Anxiety Somatization 
Scale (N = 100) (N = 36) (N = 83) (N = 39) 


Mcan score of above groups higher than score of — 


AD AD 
DASC 


PAD 


ana A 


D 


AUF 


Note.—Each capital letter in the body of the table denotes a diagnostic 
mn; A—Anxiety; S—Somatization; C—Conversion. The table ng! 
ences (p < .05) between the five diagnostic groups on the various scales. For example, on the L scale, 
differences; the Paranoid Schizophrenia group scores higher than the Anxiety and Depression groups and the Somatiza 
has a higher mean score than the Anxiety and groups. 


Scale Ratio 
L 50 2.78* 
PF 54 
K 54 
Hs + 5K 76 2.33 
D 70 8.94** 
Hy 73 <1.00 
Pd+ 4K 59 
Mf 53 11.87** 
Pa 55 
Pi+ 1K 62 16.16** 
Se+ 1K 60 
Ma+ 2K 58 6.65** 
Si 51 
(N = 49) 
F 
K 
Hs + 5K 
D PASC sc : 
Hy 
Pd + 4K Cc Cc 
Mf sc Cc 
Pa sc Cc 
Pi+ 1K sc sc 
Se + 1K c 
Ma+ 2K 
Si sc 
nia; D— 
ant differ- 
tion group 
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cal tests were independent (8). Of these 61 
differences, 17 are significant at the .05 level, 
8 at the .01 level, 5 at the .001 level, and 31 
at or beyond the .0001 level. The Anxiety and 
Depression groups are significantly different 
from each other only on the D scale, and the 
Somatization and Conversion groups are simi- 
lar on every scale. If the Anxiety and Depres- 
sion groups are combined, and the Somatiza- 
tion and Conversion groups are also treated 
as one group, each of these two groups is ef- 
fectively differentiated from the other and 
from the Paranoid Schizophrenia group. Cross- 
validation of this finding is necessary, how- 
ever, for the combining of groups was done 
after inspection of the data. Neither the Hs 
+ .5K nor the Hy scale differentiates the 
Somatization or Conversion groups from the 
other three groups. 
Discussion 

The positive findings of the present study 
contrast sharply with the almost complete 
lack of MMPI differentiation between Rubin’s 
diagnostic groups. Three possible explana- 
tions of the divergent results will be dis- 
cussed as follows: (a) High ? scores in 
Rubin’s records; (6) criterion contamination 
in the present study; and (c) criterion un- 
reliability in Rubin’s groups. 

High ? scores. Rubin (7) reported that 9 
of the 93 MMPI records in his study con- 
tained more than 100 unanswered items. The 
writer has seen the complete distribution of 
? T scores * for the 93 records, and found that 
34, or 37% of the records, contain more than 
30 unanswered items. Of these 34, 16 have 
31-50 such items, 7 have 51-90, 5 have 91- 
110, and 6 have more than 130 ? items. 

Rubin also indicated that 21% of his 
records of psychotics had more than 100 un- 
answered items, whereas only 7% of the neu- 
rotics’ records and none of the records of psy- 
chopaths had more than 100 ? items. More- 
over, the mean ? scores of Rubin’s psychotic, 
neurotic, and psychopathic groups are also in 
this descending order. The effect of a large 
number of unanswered items in any record is 
to sharply reduce the scores on the MMPI 


8 The distribution of ? scores was obtained through 
the courtesy of Paul E. Meehl. 
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scales. If the number of unanswered items is 
highest for a group of psychotic patients, 
next highest for neurotics and still lower for 
psychopaths, the scores of the MMPI scales 
of the psychotic and neurotic groups might be 
reduced to the extent that these two groups 
could not be distinguished from each other 
or from the psychopathic group. This phe- 
nomenon has apparently occurred in Rubin’s 
study, and could account in large part for 
his negative results.* 

Contamination of the criterion. The psy- 
chiatrists who made the final diagnoses of 
the cases included in this study had access 
to the psychologists’ interpretations of the 
MMPI records. It should be noted, however, 
that the MMPI interpretations were made in 
terms of the scale configurations, and there- 
fore should have had minimal effect on the 
differentiating power of any individual scale. 
A further mitigating feature is that the psy- 
chiatrists claimed they were influenced by the 
MMPI interpretations primarily in the diaz- 
nosis of “borderline” cases. Nevertheless, :he 
criterion contamination, although unavoiJ- 
able, presents a serious limitation cn the 
evaluation of the results because its effects 
cannot be precisely measured. Replication of 
this investigation in a clinical setting where 
MMPI results do not contribute to final diag- 
nosis would be desirable. 

Diagnostic unreliability. In studies in which 
a diagnostic criterion is used, divergent re- 
sults may often partly be attributed to the 
nonhomogeneity of diagnostic categories and 
to the resultant inconsistency of diagnoses. 
Any investigation based on highly unreliable 
criteria is doomed to negative or inconclusive 
outcomes. Rubin did not mention any attempt 
at diagnostic purification, and, in fact, the 
category of alcoholism, which Aaronson and 
Welsh (1) have indicated, generally overlaps 


* Aaronson and Welsh (1) have also discussed the 
possible effects of extremely high ? scores in Rubin’s 
study. 

5 As a result of the misleading and indeterminable 
effect of high ? scores, it has been the practice at the 
Minneapolis VA Hospital for the past six years to 
discourage patients from making any ? responses. 
The Cannot Say cards have been removed from 
MMPI boxes to implement this practice, so that few 
records contain any unanswered items. The statistics 
reported here were computed in 1952. 
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considerably with the psychopathic and neu- 
rotic categories. 

The null hypotheses which were tested in 
Rubin’s study and in the present investiga- 
tion can now be restated in order to evaluate 
further the divergent results and to empha- 
size the kinds of decisions which might be 
made in future research into the differenti- 
ability of diagnostic groups by means of the 
individual MMPI scales. 

Rubin’s null hypothesis may be formulated 
as follows: Individual MMPI scales do not 
differentiate the broad categories of neurosis, 
psychosis, psychopathic personality, and al- 
coholism without psychosis if the patients are 
selected without regard to diagnostic purity 
and to the number of unanswered items in 
their MMPI records. It is a contribution to 
our knowledge of the effectiveness of the 
MMPI to learn that Rubin’s null hypothesis 
cannot be rejected when the MMPI is used 
in the manner which he employed. It re- 
mained to be determined, however, how effec- 
tive the MMPI is in differentiating diagnostic 
groups when the size of the ? scores is con- 
trolled and a reasonable degree of homoge- 
neity is present in the criterion groups. 

The null hypothesis tested in the present 
investigation may be stated as follows: Indi- 
vidual MMPI scales do not differentiate be- 
tween groups of patients diagnosed in the 
categories of paranoid schizophrenia, neurotic 
depressive reaction, anxiety reaction, conver- 
sion reaction, and somatization reaction if 
(a) “pure” diagnostic groups of patients are 
selected (with the exception that cases of 
anxiety reaction with depression are in- 
cluded), (6) the number of ? items in the 
MMPI records is kept to a minimum, and 
(c) an unmeasurable amount of criterion con- 
tamination occurs. It was found that the null 
hypothesis could substantially be rejected ex- 
cept for the comparison of the Depression 
and Anxiety groups, exclusive of the D scale, 
and likewise in the comparison of the Con- 
version and Somatization groups. If, however, 
the Depression and Anxiety groups are treated 
as one class, and the Conversion and Somati- 
zation groups are combined, each of these two 
groups is effectively differentiated from the 
other and from the Paranoid Schizophrenia 
group. 


Summary 


A major purpose of this paper was to re- 
port an investigation into the effectiveness of 
the individual MMPI scales in diagnostic 
group differentiation. A correlated aim was 
to test the generality of the findings of Rubin 
who reported that only one MMPI scale dif- 
ferentiated in any manner between four diag- 
nostic groups: psychotics, neurotics, psycho- 
paths, and alcoholics without psychosis. 

A total of 307 patients was selected who 
had been diagnosed in five categories: para- 
noid schizophrenia, neurotic depressive reac- 
tion, anxiety reaction, somatization reaction, 
and conversion hysteria. An attempt was 
made to select for each diagnostic category 
only “pure” cases without overlapping diag- 
nostic trends whose MMPI records contained 
a minimum of unanswered items. Of 130 pos- 
sible differences in means for 13 cases, 61 
were significant at or beyond the .05 level. 
The positive findings of the present study, 
therefore, contrasted sharply with the results 
obtained by Rubin. Three possible explana- 
tions of the divergent results discussed were: 
large numbers of unanswered MMPI items 
and diagnostic overlapping in Rubin’s groups 
and criterion contamination in the present 
study. 


Received December 16, 1957. 
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Strong’s first interest-maturity scale was de- 
veloped by contrasting the responses of 55- 
year-old men and 15-year-old boys to items 
of his Vocational Interest Blank. Mean scores 
on this scale showed a rapid increase from age 
15 to about age 21, then a more graduai rise 
to about age 50. It seems reasonable to sup- 
pose that responses to certain personality- 
inventory items might change as a function of 
age and that a scale composed of such items 
might be sensitive to some aspects of the 
maturational process. The present research 
constitutes an inquiry into the feasibility of 
such a scale. 

A 160-item inventory was administered to 
32 15-year-old boys and’ 37 15-year-old girls. 
The boys’ responses to each item were com- 
pared with those of 50 adult males, the girls 
with those of 50 adult females. Twenty-one 
items with which adolescents agreed signifi- 
cantly more often than the same-sex adults 
were thus identified. 

These items were now treated as a scale, 
and inventories of Ss in various age groups 
were scored for the number of agreements 
with the items. An analysis of variance showed 
a highly significant increase in mean scores 
for high-school boys over the age groups of 
16, 17, and 18-19 years (p < .001). Mean 
scores of girls also significantly increased over 
the available age groups of 15, 16, and 17 

1An extended report of this study may be ob- 
tained without charge from Richard H. Walters, De- 
partment of Psychology, University of Toronto, To- 
ronto 5, Ontario, or for a fee from the American 
Documentation Institute. Order Document No. 5697, 
remitting $1.25 for microfilm or $1.25 for photo- 
copies. 


years (p < .02). A graphical representation 
of results showed a relatively rapid increase 
in mean scores during adolescence, and a 
slower increase during early adult years. Data 
from male industrial employees suggest that 
scores may continue to increase slowly up to 
50 or 60 years. 

Since the total number of agreements to all 
160 items did not significantly increase for 
male Ss from 15 to 18-19 years, changes in 
scale scores cannot simply reflect a greater 
readiness of adolescents to agree with items 
regardless of their content. The mean scores 
of 16 high-IQ and 16 low-IQ Ss were not 
significantly different. Since these Ss were 
matched approximately for age, the scale ap- 
parently reflects something more than an in- 
crease in mental age. 

The meaning of the increase in scores with 
age is nevertheless quite unclear at the pres- 
ent time. The changes may have no signifi- 
cance whatsoever beyond the fact that they 
occur. Even so, they constitute a warning to 
psychologists who administer personality in- 
ventories to adolescent Ss and then interpret 
them with the aid of adult norms. The ques- 
tion should not, however, be left thus un- 
decided. High-scoring and low-scoring Ss in 
the same age range might be compared in 
their responses to other types of stimuli. If 
significant correlates of scale scores were thus 
identified, it might be possible to construct a 
scale which reflected important aspects of the 
maturational process. 


Brief Report. 
Received June 6, 1958. 


458 


Journal of Consulting Psychology 
Vol. 22, No. 6, 1958 


“Ideal Self” Instructions, MMPI Profile Changes, and 
the Prediction of Clinical Improvement’ 


Gerald M. Rapaport * 


Fort Dix, New Jersey 


A number of recent studies on the Minne- 
sota Multiphasic Personality Inventory have 
dealt with the problem of response sets in- 
herent in that test (1, 6, 7) and with the ef- 
fects of experimentally induced instructional 
sets on test performance (4, 8, 10, 11). These 
studies were undertaken to specify those vari- 
ables which may influence patterns of re- 
sponse apart from the content of the test 
items and to determine whether these sys- 
tematic effects can be utilized to improve the 
diagnostic and prognostic merit of the instru- 
ment. 

The present study investigated the effects 
of an “ideal self” instructional set on test 
performance. The following hypotheses were 
put forth: (a) the introduction of an “ideal 
self” set produces changes in the MMPI pat- 
ternings of psychiatric patients relative to the 
values obtained under the customary instruc- 
tional procedures; (+) the changes in MMPI 
patterns are in the direction of fewer elevated 
scale scores; (c) the more deviant the original 
MMPI record, the more marked the change; 
(d) the interindividual variability is signifi- 
cantly less on the “ideal self” pattern than on 
the “real self” pattern. 

A study by Grayson & Olinger (10) closely 
paralleled the present design. Employing the 
novel instructional set “typical well-adjusted 
adult on the outside,” the authors found that 
the ability to improve MMPI performance on 
the second testing was a favorable prognostic 


1 The author wishes to express his gratitude to the 
following individuals for their aid in data collection, 
statistical analyses, anc for their suggestions: Robert 
J. Marshall, Arthur Burdett, Bernard Doran, Charles 
Henin, Alfred Herbert, David Perry, and John Ryan. 

2 Currently at Illinois Institute of Technology. 


index of early hospital discharge. In the pres- 
ent study, an attempt was made to determine 
whether Grayson & Olinger’s prognostic find- 
ing held true under the present instructional 
procedures. 


Tests of the Hypotheses 
Procedure 


The sample used in the present study con- 
sisted of 48 military personnel, all of whom 
were psychiatric patients. The subjects (Ss) 
were drawn over a six-month period from the 
psychiatric facilities of an Army basic train- 
ing center.* Thirty-three Ss were seen as out- 
patient referrals, while the remaining 15 were 
psychiatric inpatients. Diagnostically, the 
sample consisted of 12 cases of schizophrenia, 
11 individuals diagnosed NND (no neuro- 
psychiatric disease), 20 character and be- 
havior problems, two psychotic depressions, 
and three cases of epilepsy. The MMPI 
was administered to each S twice, first under 
the customary “real self” instructional set 
and, immediately thereafter, under the “ideal 
self” instructions. The patients invariably ac- 
cepted the second MMPI as part of the 
routine clinical psychological workup. 


Results and Discussion 


Table 1 shows the mean for “real self’ and 
“ideal self” administrations of the test. The 
group's “real self” pattern reflected a good 
deal of psychiatric distress, while the “ideal 
self” pattern manifested an absence of clini- 
cal pathology. The hypothesis which pre- 
dicted that the new instructional set would 


3 This reflects an incidence rate of .004 per thou- 
sand post population per annum. 
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Table 1 
Means, Standard Deviation, and Tests of Significance of ‘Real Self’? and “Ideal Self” Differences 


Mean 
Real 


Mean SE 
Change Diff. 


Mean 
Ideal 


SD SD 


4.87 

9.58 
13.08 
20.56 
30.41 
28.21 
27.37 
26.43 
15.00 
37.35 
38.87 
23.02 


9.54 
5.73 
19.56 
14.31 
18.83 
20.68 
23.25 
24.54 
9.87 


4.67 53 
3.85 .70 
6.48 78 
6.25 1.27 
11.58 1.54 
7.42 1.13 
4.12 85 
1.89 
5.12 75 
25.71 11.64 1.59 
28.02 10.85 1.89 
23.04 02 83 


* Significant beyond .05 level. 
** Significant beyond .01 level. 
Note.—These figures are expressed in MMPI raw scores. 


produce statistically significant changes in the 
over-all MMPI pattern was confirmed as evi- 
denced by 10 ?’s significant at the .01 level of 
confidence and one ¢ significant at the .05 
level. The second hypothesis, which proposed 
that the change would be in the direction of 
greater freedom from mental distress was like- 
wise confirmed, as evidenced by the statisti- 
cally significant downward changes of the F, 
Hs, D, Hy, Pd, Pa, Pt, and Sc scales. 

The third hypothesis was tested by corre- 
lating each individual’s “real self” score on 
each scale with the discrepancy scores (real 
self minus ideal self.) The Pearson r’s are 
shown in Table 2. Significant relationships, 
established for each of the nine clinical scales, 
indicate that the more pathological the “real 


Table 2 
Correlations Between Real Self and Extent of Change 


Scale 


Hs 
D 
Hy 
Pd 
Mf 
Pa 
Pt 
Se 
Ma 


* Significant beyond .01 level of confidence. 


self” score, the greater the discrepancy be- 
tween “real self” and “ideal self.” 

The final hypothesis was tested by com- 
paring for each scale the standard deviation 
of the “real self” distribution with the stand- 
ard deviation of the “ideal self” patterns. 
Ten of the twelve MMPI scales shifted in the 
predicted direction, four of the variability 
changes (Hs, D, Pt, Sc) attaining statistical 
significance as measured by the F test (5, pp. 
163-165). The probability of obtaining the 
present distribution of variability changes by 
chance is less than .05 as tested by chi square. 
Therefore, it may be concluded that patients 
taking the MMPI show a higher degree of 
correspondence in specifying what they regard 
as “ideal” behavior than in assessing their 
present state of psychological adjustment. 

In view of the close correspondence between 
the methodologies of the Grayson-Olinger 
study (10) and the present investigation, a 
comparison of the two sets of findings was 
undertaken. The mean “real self” and novel 
instructional set profiles for the two samples 
are presented in Fig. 1. An inspection of the 
plotted means shows that on the “real self” 
administration, the present sample manifested 
fewer psychopathic trends but more neurotic 
and psychotic features than the Grayson- 
Olinger sample, while on the novel instruc- 
tional sets the two profiles were highly similar 
in shape. It appears that the new instructional 
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Fig. 1. Comparison of mean profiles of Grayson- 
Olinger sample with means of MHCS* sample. 


sets employed in the two studies have exerted 
virtually the identical effect upon the over-all 
distributions of the MMPI scales. 


Test of the Relationship Between MMPI 
Improvement and Clinical Prognosis 


Procedure 


Using an inpatient psychiatric sample, 
Grayson and Olinger found that MMPI im- 
provability was related to early discharge 
from the hospital. To determine whether 
Grayson and Olinger’s finding could be repli- 
cated in the present data, follow-up data were 
obtained for Ss receiving diagnoses other than 
NND. The obtainable information, available 
for 32 of 37 Ss receiving psychiatric diag- 
noses, consisted of the patient’s military status 
at the time of follow-up. For the purpose of 
the present evaluation, the Ss were divided 
into two groups: those receiving premature 
discharges either for psychiatric disease or 
characterological inaptitude or unsuitability; 
and those successfully pursuing their military 
tours of duty. Based upon the Grayson-Olinger 
findings, the following predictions were of- 
fered: (a) On each MMPI scale the dis- 
crepancy scores between “real self” and “ideal 
self” is greater for the “in service” (good 
prognosis) group than for the “premature dis- 
charge” (poor prognosis) group. (6) The 
total discrepancy score differentiates between 
the “in service” and “premature” groups. 


4Mental Hygiene Consultation Service, Fort Dix. 


Results and Discussion 


The first hypothesis was tested by ¢ tests 
comparing the means of the discrepancy 
scores for each scale. While none of the ?¢’s 
attained statistical significance, an interesting 
trend in line with the prediction was mani- 
fested. The means of the discrepancy scores 
for the “in service” group were of greater 
value than the corresponding means for the 
“premature discharge” group on eight of the 
nine clinical scales. The probability of ob- 
taining the present distribution of discrepancy 
scores on the basis of chance is .048, which 
warrants a rejection of the null hypothesis at 
the .05 level of confidence. 

The second hypothesis was tested by com- 
paring the total reduction in T scores for Ss 
in the good prognosis group, with the reduc- 
tion in T scores for Ss in the poor prognosis 
group. The results of this test, which directly 
paralleled the one undertaken by Grayson and 
Olinger, are shown in Table 3. In the Gray- 
son-Olinger study, the “good prognosis” sroup 
showed a significantly greater percentage of 
individuals meeting each of the three criterion 
scores than did the “poor prognosis” group. 
For the present sample, differences in the 
predicted direction occurred on two of the 
three criterion scores. However, the differ- 
ences did not attain statistical significance as 
measured by chi square, which indicates that 
in the military sample, the ability to improve 
MMPI performance is not related to environ- 
mental prognosis. 


Table 3 


Comparison of the Percentage cf Individuals Meeting 
Three Criterion Scores of MMPI Improvement 


Reduction in MMPI 
total T score 


90 or 65 or 


more 


45 or 
more 


Sample 


Grayson-Olinger 
Good prognosis group 
Poor prognosis group 
Military 
Good prognosis group 
Poor prognosis group 


68% 74% 


42% 


87% 
09% 
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MHCS Seif” 
Groyson—Olinger “Real Seif” 
MHCS “idea! 
Grayson—Olinger “idea! Seit* : 
| 
JA 
A\ 
‘ 
more — 
19% 3% 
571% 81% 
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In comparing the two samples, it can be 
seen that the present sample as a whole mani- 
fested a higher percentage of MMPI improve- 
ment at each successive criterion score. Of 
special significance is the fact that the “poor 
prognosis” group in the present sample showed 
approximately as high a percentage of indi- 
viduals meeting the criterion scores as the 
“good prognosis” group of the Grayson-Ol- 
inger sample. Whether or not the intergroup 
differences in percentages of individuals meet- 
ing improvability criteria are of statistical 
significance cannot be ascertained, as the 
breakdown of the number of cases in Grayson- 
Olinger’s group was not indicated. However, 
another measure of intersample differences in 
improvability was obtainable. In the Grayson- 
Olinger sample, 5 of 45 cases (11%) mani- 
fested completely normal patterns on their 
second MMPI in that all nine clinical scores 
were below the critical range. On the other 
hand, 15 of the 37 patients receiving psychi- 
atric diagnoses in the present sample (42%) 
produced completely improved “ideal self” 
records. A chi square test which compared the 
two samples with respect to the percentage of 
complete improvement gave a value of 10.30, 
which warrants a rejection of the null hypoth- 
esis at the .02 level of confidence. This sta- 
tistically significant finding, along with the 
between group differences shown in Table 3, 
indicates that the military sample manifested 
significantly greater MMPI improvement than 
the Grayson-Olinger sample. Two explana- 
tions of this difference suggested themselves: 
(a) the novel instructional sets employed in 
the two studies produced essentially different 
response sets among the Ss; (4) populational 
differences accounted for the diverse findings. 


To determine whether the instructional sets were 
different, the MMPI was administered according to 
the Grayson-Olinger methodology to 12 psychiatri- 
cally diagnosed military patients. Of these Ss, nine 
improved their patterns, and seven manifested com- 
plete improvement. Five of the eight inpatients 
among this group manifested complete improvement. 
The high percentage of complete improvement cor- 
responds to the data obtained under the “ideal self” 
instructional set. It therefore appears that popula- 
tional variables, rather than the differential effects of 
the instructions, determined the intersample differ- 
ence in MMPI improvability. The following are 
among the variables characterizing the military sam- 


Gerald M. 


Rapaport 


ple. For the most part, the sample consisted of young 
men in the 20—30 age range, many of whose prob- 
lems were acute and situational in nature. Moreover, 
as a number of writers have stated (2, 3, 9), the 
motivational component in a number of military psy- 
chiatric cases differs from that found among civilian 
patients as a result of more immediate “secondary 
gain” from the illness. Secondary gain is particularly 
great during times of stress such as combat or—to a 
somewhat lesser degree—basic training. It is quite 
likely that one or more of these variables could ac- 
count for the fact that MMPI improvability, though 
high for this sample, does not relate to environ- 
mental prognosis. One possible explanation would be 
the fact that since the motivational level of psy- 
chiatric Ss in this population is at times directed less 
towards the attainment of the “ideal self” state of 
asymptomatic behavior than towards maintenance of 
“real self” symptomatology, a number of these indi- 
viduals may not actually be striving—at that time— 
towards attainment of the ideal. In short, the ability 
to regard a symptom-free psychological state as 
synonymous with one’s ideal may only be a prog- 
nostically favorable attribute if there is not the 
complicating factor of immediate secondary gain 
from illness. 


Whatever the explanation of the failure of 
the present sample to reproduce the Grayson- 
Olinger findings, it is suggested that the 
Grayson-Olinger prognostic finding may be of 
somewhat restricted generality. Further work 
on this question is needed in order to clearly 
delineate the predictive utility of this prog- 
nostic tool. 


Summary and Conclusions 


The present study investigated the influ- 
ence of an “ideal self” instructional set on 
MMPI performance. Four hypotheses were 
tested: (a) The introduction of an “ideal 
self” set produces changes in the MMPI pat- 
ternings of patients referred for psychiatric 
consultation; (5) these changes are in the di- 
rection of fewer scores in the “critical” or 
“unhealthy” range; (c) the more deviant the 
original MMPI record, the more marked the 
changes; (d) the interindividual variability is 
significantly less on the “ideal self” pattern 
than on the “real self” pattern. 

The MMPI was administered twice to a 
group of 48 military psychiatric patients. The 
first administration of the test was uncer the 
usual instructional set while the second was 
under an “ideal self” set. The above four hy- 
potheses were confirmed. 
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In a study highly similar to the present in- 
vestigation, Grayson-Olinger, using a “typical 
well-adjusted person” instructional set, found 
that MMPI improvability was related to clini- 
cal prognosis. An attempt was made to test 
the generality of Grayson and Olinger’s prog- 
nostic finding. Follow-up data were obtained 
for the majority of military subjects six 
months to one year following their participa- 
tion in the study. It was determined that the 
ability to improve MMPI performance was 
not significantly related to prognosis, although 
the trend was in that direction. Moreover, it 
was noted that the military sample showed a 
significantly greater over-all ability to im- 
prove MMPI performance than did the Gray- 
son-Olinger sample. The possibility that the 
intersample difference in MMPI improvement 
could be attributed to instructional differences 
was ruled out. It therefore appears that popu- 
lational variables account for the between- 
group difference. While further study of the 
Grayson-Olinger prognostic finding is called 
for, it is suggested that these authors’ find- 
ing may be of restricted generality. 


Received November 26, 1957. 
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Extraversion, Neuroticism, and Verbal 
Ability Measures’ 


A. W. Bendig 


University of Pittsburgh 


Several studies have shown that the Mani- 
fest Anxiety Scale tends to show a very low, 
statistically nonsignificant negative correlation 
with verbal ability measures such as the ACE 
test. Since the MAS is highly correlated (r = 
.77) with scores from the Neuroticism scale 
(1) included in Eysenck’s (2) Maudsley Per- 
sonality Inventory (MPI) it might be in- 
ferred that the Neuroticism scale would simi- 
larly show nonsignificant correlations with 
verbal ability tests. Although MAS scores and 
scores from the MPI Extraversion scale are 
significantly related (1), the correlation is low 
enough (r = — .35) that a similar prediction 
cannot be made with much confidence con- 
cerning the relation of Extraversion and ver- 
bal ability scores. 

The MPI and a modified 30-item (syno- 
nyms) version of the Cooperative Vocabulary 
Test were administered to 210 male under- 
graduate students (primarily sophomores) en- 
rolled in an introductory psychology course. 
Stanine scores on the ACE test taken by the 
Ss as entering freshmen were available in the 
files of the university testing service for 108 
of these Ss. Product-moment correlations be- 
tween the raw scores from the MPI and vo- 
cabulary scales were computed for all 210 Ss, 
while correlations between these scores and 
the ACE (total) scores were computed for 
the smaller subgroup of 108 Ss. 

The correlations of the vocabulary test with 
the Extraversion and Neuroticism scales were 
—.12 and —.12, while the intercorrelation 


1An extended report of this study may be ob- 
tained without charge from A. W. Bendig, De- 
partment of Psychology, University of Pittsburgh, 
Pittsburgh 13, Pa., or for a fee from the American 
Documentation Institute. Order Document No. 5696, 
remitting $1.25 for microfilm or $1.25 for photo- 
copies. 


between the two MPI scales was—.15. The 
first two coefficients are significant at the .10 
level, and the last correlation is significant at 
the .05 level (N = 210). The correlations of 
the ACE test with Extraversion and Neu- 
roticism were —.01 and —.13 (not signifi- 
cant) with no evidence of curvilinear regres- 
sion (NV = 108). The intercorrelation between 
the MPI scales was —.09 (not significant), 
while the correlation between the ACE and 
vocabulary scores was .26 (significant at the 
.01 level). 

The low correlations between the MPI 
scales and the two verbal ability tests’ found 
for this sample suggest that the MPI is not 
measuring the intellectual aptitudes repre- 
sented by these two tests. This is important 
for two reasons. Such results indicate that 
the Ss selected for experimental studies on 
the basis of extreme scores on the MPI Ex- 
traversion and Neuroticism scales should not 
differ significantly in verbal ability, and any 
differences found on dependent variables, such 
as learning and retention measures, for these 
extreme groups cannot be attributed to intel- 
lectual aptitude differences. The low correla- 
tions also suggest the generalization that 
measures of “personal maladjustment,” such 
as those represented by the MAS and MPI 
Neuroticism scales, are not related to verbal 
ability measures. 

Brief Report. 
Received June 6, 1958. 
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In the use of the Stanford-Binet Test of 
Intelligence (SB) with children of low socio- 
economic status, the question arises whether 
or not children are penalized by limited social 
experience and cultural deprivation. Gen- 
erally, verbal test items are thought of as 
being susceptible to degrees and varieties of 
social exposure, while nonverbal items are 
free of this influence. The basic premise is 
that if the verbal items of the SB are socially 
biased with respect to children of low socio- 
economic status, the SB IQ would be arti- 
ficially and significantly depressed below the 
IQ of a nonverbal, nonsocially biased measure. 
The hypothesis to be tested is that, for a 
population of low socioeconomic status, there 
is no significant difference between the Stan- 
ford-Binet IQ and an IQ obtained from a non- 
verbal test of intelligence. 

The SB has verbal and nonverbal items un- 
evenly distributed through the test. McNemar 
(1) reports that the first factor loading of 
vocabularly progressively increases (.59 to 
.91) from experimental age 6 through 18. 
This rules out an analysis of SB item dis- 
crimination on the basis of verbal and non- 
verbal item content. 

Of the nonverbal scales available, the 
Colored Raven Progressive Matrices (CRPM) 
was chosen for the comparative study. CRPM 
has been labeled (4) an “Important British 
Nonverbal Test of g” as a result of factorial 
studies (5). The directions can be given in 
pantomime, and the responses of the child re- 
quire only gross motor pointing. While there 
is no definite evidence that the CRPM items 
are free of social bias, it would be difficult, in- 


deed, to demonstrate that such bias exists. 
The assumption, in this study, is that CRPM 
items have no social bias and that success in 
responding to the items is independent of 
home and school experience. 


Martin and Wiechers (2) present a description and 
rationale of the CRPM in their report of a com- 
parative study of WISC and CRPM scales, using a 
representative sample of 100 Indiana children, nine 
years old. These correlations are reported: CRPM 
IQ — WISC Verbal IQ, r= 84; CRPM IQ — WISC 
Performance r= .83; and CRPM IQ — WISC 
Full Scale IQ, r= .91. 

Ravens (3), working with English children, re- 
ports that at age nine, the CRPM-SB correlation is 
.65. Assuming that the maximum mental age used by 
Ravens was 11-0, the maximum IQ of the youngest 
nine-year-old child would be 122, and the maximum 
IQ of the oldest nine-year-old child would be 111. 
This limited ceiling of the CRPM suggests that the 
reported correlation is artificially lower than the 
actual relationship. An assumption in this study is 
that, using a sample of 7-, 8-, and 9-year-old chil- 
dren, the CRPM IQs correspond quite closely with 
SB IQs and WISC IQs. 


Method 


The sample studied consisted of 789 pupils, 
with an age range from seven years through 
nine years eleven months. These pupils at- 
tended public schools serving the lowest socio- 
economic areas of a northeastern city. All 7-, 
8-, and 9-year-old children in the first three 
grades of these schools were studied plus the 
children from five fourth-grade classes. The 
sample contained the following distributions: 
349 Negro children, 440 white children; 389 
boys, 400 girls; 271 aged 7, 273 aged 8, and 
245 aged 9. They were tested individually, 
first with the SB Form L followed by the 
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CRPM. The scores of the CRPM were slightly 
extrapolated to give a mental age range from 
four years six months to eleven years six 
months. 

Means and variances were calculated for 
the SB scores and the ¢ test used to determine 
the significance of the difference of means. 
Results were compared on the basis of age, 
sex, color, color-sex (Negro boys vs. Negro 
girls, etc.), age-color-sex (seven-year-old Ne- 
gro boys vs. seven-year-old white boys, etc.), 
and SB IQ levels (SB below 70, 70-89, etc.). 
CRPM scores were analyzed in the same 
manner. 

Each child’s difference score (SB IQ minus 
CRPM IQ) was treated as a raw score and 
the significance of the difference of the means 
of the differences determined for the above 


groupings. 
Results 


Stanford-Binet mean differences were slight 
and can be attributed to chance for the follow- 
ing groups: Negro vs. white children, boys vs. 
girls, Negro boys vs. Negro girls, white boys 
vs. white girls, and age-color-sex groups. 
There was a tendency for the SB means to 
decrease as age increased. The difference of 
SB means for the seven- and eight-year-old 
children was 2.44 IQ points, with a ¢ value 
of 2.00, which is significant at the .05 but not 
the .01 level of confidence. The mean of the 
eight-year-old group was 2.73 points higher 
than that of the nine-year-old group. The ¢ 
for this difference was 2.17, significant at the 
.OS level of confidence. However, the decrease 
in IQ with age was comparable for Negro and 
white children and for boys and girls. Early 
school retardation may account, in part, for 
these differences. The inference is that in this 
population of children from low socioeconomic 
level, intelligence as measured by the SB is 
comparable for color, sex, color-sex, and age- 
color-sex groups. Hence, the basic assumption 
is made that there are no real differences in 


1 Tables of SB and CRPM raw scores and tables 
of SB, CRPM, and difference score means, variances, 
and group totals by SB IQ levels have been de- 
posited with the American Documentation Institute. 
Order Document No. 5739, remitting $1.25 for micro- 
film or $1.25 for photocopies. 
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intelligence as measured by the SB for these 
Negro and white boys and girls. 

Significant differences of CRPM IQ means 
were found when groups were compared on 
the basis of color. Table 1 shows that the 
white children had a mean CRPM IQ which 
was 10.29 points higher than the mean of 
the Negro children. This difference was sig- 
nificant well beyond the .01 level of con- 
fidence, having a ¢ value of 8.23. This is in 
gross contrast to the SB IQ results where no 
significant differences were found. 

When the data were analyzed to determine 
whether this relationship held for varying SB 
IQ levels, the results were similar. There were 
significant differences between the means of 
Negro and white children for all of the SB 
IQ groupings (SB below 70, 70-89, 90—109, 
110-129, 130 and above). The white chil- 
dren’s means were higher than the Negro 
children’s means and the differences were 
significant at the .01 level of confidence, ex- 
cept for the SB group of 110-129. This latter 
difference was significant at the .02 level of 
confidence. 

When CRPM IQ mean differences were 
considered for age-color-sex groups (seven- 
year-old Negro boys vs. seven-year-old white 
boys, seven-year-old Negro girls vs. seven- 
year-old white girls, etc.) the ¢ values of the 
differences were all above 3.00 with the ex- 
ception of the nine-year-old Negro and white 
boys. The latter difference had a ¢ value of 
1.80. 

There was no significant difference between 
the CRPM means of boys and girls nor be- 
tween Negro boys and girls or white boys and 
girls. As with the SB IQs, there was a tend- 
ency for the CRPM means to decline with 
age. The mean difference between seven- and 
eight-year-olds was significant at the .05 level 
of confidence. The mean of the nine-year-old 
group was lower than the eight-year-old mean, 
but the difference was not significant. 

The difference scores (each child’s SB IQ 
minus his CRPM IQ) give a picture of the 
discrepancy between individual IQs on the 
two tests. It is to be remembered that for the 
varying groups, the SB IQ means were es- 
sentially the same. Here, again, color differ- 
ences were significant well beyond the .01 
level of confidence. 


Comparison of Stanford-Binet and Raven Matrices 


Table 1 


Means, Standard Deviations, and #’s for SB IQs, CRPM IQs, and Difference Scores by 
Color, Age, and Sex Groupings 


SB 1Q 


CRPM IQ Diff. scores* 


Groups SD 


SD 


90.3 
90.6 


13.2 
15.1 


Negro 
White 


Boys 
Girls 


89.8 


13.7 
91.1 


14.7 


7 Years 
8 Years 


92.9 
90.5 


13.6 
14.8 


8 Years 
9 Years 


90.5 
87.7 


14.8 
13.9 


90.2 
90.3 


Negro boys 
Negro girls 


12.8 
13.6 


White boys 
White girls 


89.5 
91.7 


144 
15.7 


Negro boys, 7 
White boys, 7 


91.9 
92.5 


12.3 
12.2 


Negro boys, 8 
White boys, 8 


90.9 
89.0 


13.3 
15.0 


Negro boys, 9 
White boys, 9 


88.0 
87.6 


12.2 
15.0 


92.5 
94.2 


13.9 
15.1 


Negro girls, 7 
White girls, 7 


89.7 
92.4 


13.0 
16.7 


Negro girls, 8 
White girls, 8 


88.2 
87.3 


Negro girls, 9 
White girls, 9 


13.3 
14.2 


* Significant at the .05 level of confidence. 
** Significant at the .01 level of confidence. 
* Each child's SB IQ minus his CRPM IQ. 


The average number of points by which 
the Negro children’s SB IQs exceeded their 
CRPM IQs was 9.83. For the white children, 
the average difference was .17 favoring the 
CRPM. The ¢ value for these differences was 
9.26, a significant finding when one considers 
that, for groups of this size, the ¢ value of 
2.58 is significant at the .01 level of con- 
fidence. The results, when analyzed by age- 
color-sex groups, were similar. The Negro 
children, as a group, obtained significantly 
higher SB IQs than CRPM IQs. The IQs of 


the white children were similar on the two 
tests. Therefore, we accept the null hypothe- 
sis that there is no significant difference be- 
tween verbal and nonverbal intelligence test 
scores for low socioeconomic status white chil- 
dren, and we fail to accept the null hypothe- 
sis that there is no significant difference be- 
tween verbal and nonverbal intelligence test 
scores for low socioeconomic status Negro 
children. 

No significant differences were found be- 
tween Negro boys and girls nor between white 
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CRPM. The scores of the CRPM were slightly 
extrapolated to give a mental age range from 
four years six months to eleven years six 
months. 

Means and variances were calculated for 
the SB scores and the ¢ test used to determine 
the significance of the difference of means. 
Results were compared on the basis of age, 
sex, color, color-sex (Negro boys vs. Negro 
girls, etc.), age-color-sex (seven-year-old Ne- 
gro boys vs. seven-year-old white boys, etc.), 
and SB IQ levels (SB below 70, 70-89, etc.). 
CRPM scores were analyzed in the same 
manner. 

Each child's difference score (SB IQ minus 
CRPM IQ) was treated as a raw score and 
the significance of the difference of the means 
of the differences determined for the above 


groupings. 
Results * 


Stanford-Binet mean differences were slight 
and can be attributed to chance for the follow- 
ing groups: Negro vs. white children, boys vs. 
girls, Negro boys vs. Negro girls, white boys 
vs. white girls, and age-color-sex groups. 
There was a tendency for the SB means to 
decrease as age increased. The difference of 
SB means for the seven- and eight-year-old 
children was 2.44 IQ points, with a ¢ value 
of 2.00, which is significant at the .05 but not 
the .01 level of confidence. The mean of the 
eight-year-old group was 2.73 points higher 
than that of the nine-year-old group. The ¢ 
for this difference was 2.17, significant at the 
.OS level of confidence. However, the decrease 
in IQ with age was comparable for Negro and 
white children and foi boys and girls. Early 
school retardation may account, in part, for 
these differences. The inference is that in this 
population of children from low socioeconomic 
level, intelligence as measured by the SB is 
comparable for color, sex, color-sex, and age- 
color-sex groups. Hence, the basic assumption 
is made that there are no real differences in 


1Tables of SB and CRPM raw scores and tables 
of SB, CRPM, and difference score means, variances, 
and group totals by SB IQ levels have been de- 
posited with the American Documentation Institute. 
Order Document No. 5739, remitting $1.25 for micro- 
film or $1.25 for photocopies. 
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intelligence as measured by the SB for these 
Negro and white boys and girls. 

Significant differences of CRPM IQ means 
were found when groups were compared on 
the basis of color. Table 1 shows that the 
white children had a mean CRPM IQ which 
was 10.29 points higher than the mean of 
the Negro children. This difference was sig- 
nificant well beyond the .01 level of con- 
fidence, having a ¢ value of 8.23. This is in 
gross contrast to the SB IQ results where no 
significant differences were found. 

When the data were analyzed to determine 
whether this relationship held for varying SB 
IQ levels, the results were similar. There were 
significant differences between the means of 
Negro and white children for all of the SB 
IQ groupings (SB below 70, 70-89, 90—109, 
110-129, 130 and above). The white chil- 
dren’s means were higher than the Negro 
children’s means and the differences were 
significant at the .01 level of confidence, ex- 
cept for the SB group of 110-129. This latter 
difference was significant at the .02 level of 
confidence. 

When CRPM IQ mean differences were 
considered for age-color-sex groups (seven- 
year-old Negro boys vs. seven-year-old white 
boys, seven-year-old Negro girls vs. seven- 
year-old white girls, etc.) the ¢ values of the 
differences were all above 3.00 with the ex- 
ception of the nine-year-old Negro and white 
boys. The latter difference had a ¢ value of 
1.80. 

There was no significant difference between 
the CRPM means of boys and girls nor be- 
tween Negro boys and girls or white boys and 
girls. As with the SB IQs, there was a tend- 
ency for the CRPM means to decline with 
age. The mean difference between seven- and 
eight-year-olds was significant at the .05 level 
of confidence. The mean of the nine-year-old 
group was lower than the eight-year-old mean, 
but the difference was not significant. 

The difference scores (each child’s SB IQ 
minus his CRPM IQ) give a picture of the 
discrepancy between individual IQs on the 
two tests. It is to be remembered that for the 
varying groups, the SB IQ means were es- 
sentially the same. Here, again, color differ- 
ences were significant well beyond the .01 
level of confidence. 


Comparison of Stanford-Binet. and Raven Matrices 


Table 1 


Means, Standard Deviations, and ¢’s for SB IQs, CRPM IQs, and Difference Scores by 
Color, Age, and Sex Groupings 


SB IQ 


CRPM IQ Diff. scores* 


Groups SD 


Mean SD t Mean SD 


13.2 
15.1 


Negro 
White 


Boys 
Girls 


89.8 
91.1 


13.7 
14.7 


7 Years 
8 Years 


92.9 
90.5 


13.6 
14.8 


8 Years 
9 Years 


90.5 
87.7 


14.8 
13.9 


Negro boys 
Negro girls 


90.2 
90.3 


12.8 
13.6 


White boys 
White girls 


89.5 
91.7 


14.4 
15.7 


Negro boys, 7 
White boys, 7 


91.9 
92.5 


12.3 
12.2 


Negro boys, 8 
White boys, 8 


90.9 
89.0 


13.3 
15.0 


88.0 
87.6 


Negro boys, 9 
White boys, 9 


12.2 
15.0 


13.9 
15.1 


Negro girls, 7 71 92.5 
White girls, 7 83 94.2 


89.7 
924 


13.0 
16.7 


Negro girls, 8 61 
White girls, 8 73 


88.2 
87.3 


13.3 
14.2 


Negro girls, 9 53 
White girls, 9 59 


80.5 
90.8 


8.23** 98 
—0.2 


14.8 
15.6 


86.9 1.05 2.9 
85.5 5.6 


16.2 
15.8 
88.8 


2.11* 16.0 


5.1 16.5 
0.61 5.1 16.5 
15.8 
1.46 15.6 
13.9 
0.15 15.4 
15.8 
3.62** 14.0 
15.4 
16.9 
16.2 


1.80 15.1 


4.78** 


3.48** 


* Significant at the .05 level of confidence. 
** Significant at the .01 level of confidence. 
* Each child's SB IQ minus his CRPM IQ. 


The average number of points by which 
the Negro children’s SB IQs exceeded their 
CRPM IQs was 9.83. For the white children, 
the average difference was .17 favoring the 
CRPM. The ¢ value for these differences was 
9.26, a significant finding when one considers 
that, for groups of this size, the ¢ value of 
2.58 is significant at the .01 level of con- 
fidence. The results, when analyzed by age- 
color-sex groups, were similar. The Negro 
children, as a group, obtained significantly 
higher SB IQs than CRPM IQs. The IQs of 


the white children were similar on the two 
tests. Therefore, we accept the null hypothe- 
sis that there is no significant difference be- 
tween verbal and nonverbal intelligence test 
scores for low socioeconomic status white chil- 
dren, and we fail to accept the null hypothe- 
sis that there is no significant difference be- 
tween verbal and nonverbal intelligence test 
scores for low socioeconomic status Negro 
children. 

No significant differences were found be- 
tween Negro boys and girls nor between white 
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349 90.3 0.27 16.3 po 9.26** 
389 1.25 2.32* 
271 2.00* 0.71 
853 196 
273 2.17* 85.3 196 1.24 
73 
164 0.07 818 16.7 1.67 
HE 13 189 
225 1.52 90.6 18.7 1.29 
909 192 
50 0.26 828 147 3.99%* 
196 
61 0.79 80.5 178 4.08** 
53 0.19 824 17.0 2.29* 
0.71 84 6560 11.2 140 4.62°* 
94.2 183 . 0.0 16.0 
1.06 33 25 114 151 3.55** . 
90.1 208 23 143 
0.33 76 133 106 121 3.78" 
87.2 17.7 01 17.0 
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boys and girls. There were no differences on 
the basis of age groups. This is probably ac- 
counted for by the decrease in both SB IQs 
and CRPM IQs with increased age. 

The analysis of difference scores revealed 
a sex difference. The girls averaged 5.56 points 
lower in their CRPM IQs, and the boys were 
2.91 points lower. This difference was signifi- 
cant at the .05 level of confidence. Negro girls 
tended to have CRPM IQ scores falling lower 
below their SB IQ scores than did Negro 
boys, and the white girls tended to have lower 
CRPM IQs, while the white boys had differ- 
ences slightly favoring the SB IQs. 


Summary 

In a population of 789 children of low 
socioeconomic status, with an age range of 
seven years to nine years eleven months, no 
evidence was found to support the hypothesis 
that social bias in the verbal items of the SB 
depressed the SB IQ below the nonverbal, 
nonsocially biased CRPM IQ. 

Mean SB IQs were similar for Negro and 
white children, boys and girls, and for group- 
ings within the three age levels. The CRPM 
discriminated on the basis of color, with the 
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Negro mean scores in all instances lower than 
the white mean scores. 

No significant difference was found for 
white children between their SB IQ, which 
contains many verbal items, and their non- 
verbal CRPM IQ. However, Negro children’s 
CRPM IQ means were significantly lower 
than their SB IQ means. 

It is suggested that the CRPM cannot be 
considered a test of intelligence or a measure 
of g but is rather a measure of a specific skill. 
The present findings suggest that intelligence 
tests heavily loaded with nonverbal items may 
discriminate against Negro children. 


Received December 10, 1957. 
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Trail Making Test as a Screening Device for 
the Detection of Brain Damage 


Earl C. Brown,’ Albert Casey, Ralph I. Fisch, 
and Charles Neuringer 
VA Center, Wadsworth, Kansas 


The Trail Making Test is a relatively un- 
known instrument, although over the years 
several investigators have utilized it in mak- 
ing studies. As best as can be determined, the 
Trail Making Test dates back to a prototype 
originally known as the Taylor Number Se- 
ries,* a test which required the subject to 
draw lines connecting a series of numbers 
from 1 to 50 that were scattered randomly 
about a rectangular sheet of paper. In 1938, 
Partington (4) renamed the test “A Test of 
Distributed Attention” and constructed forms 
which were modifications of the above. He 
viewed the test mainly as one that dealt with 
the speed of motor performance. Further use 
of the test, however, led Partington to con- 
clude that a successful performance demanded 
that the subject manifest an integratfon of 
previously learned material into a pattern of 
response that involved shifts in organization, 
recall, and recognition, as well as motor per- 
formance. Later, Partington and Leiter (4) 
administered the test to an unselected adult 
male population with the notion of develop- 
ing a standardized procedure of administra- 
tion and scoring. When the results were scru- 
tinized, the authors concluded that the test 
was measuring general mental ability. They 
demonsirated that the scores on the test were 
in general agreement with the estimated in- 
telligence of the individuals to whom it was 
administered. At that time the test was called 
the Partington Pathways Test. 

A follow-up study (4) in 1945 on 256 un- 
selected World War II veterans ranging in 


1Currently Area Chief Psychologist, VA Area 
Medical Office, Columbus, Ohio. 
2 J. E. Partington. Personal communication. 1956. 


age between 19 to 36 years, who had also 
taken a Stanford-Binet 1937 Revision, re- 
vealed a .68 correlation between the Path- 
ways and the latter. The test was then in- 
corporated as a performance subtest in the 
Army Individual Test of General Ability (7). 
Again it was rechristened and went under the 
title of the Trail Making Test. As a part of 
the Army battery, it underwent an initial 
validation study conducted on a sample of 
465 soldiers, both white and Negro (6). The 
reliability of the Trail Making was found to 
be .84 for the white group, with the figure 
for the Negroes being essentially the same. 
In final battery form, the test correlated .65 
with the Army General Classification Test. 
This version of Trail Making Test was used 
in the present study. The War Department 
released it from classified status and granted 
the right to reproduce the forms locally. As 
used in the Army, the goal was primarily one 
of quick screening of level of intellectual func- 
tioning. It was felt that the test would pro- 
vide an index that was more accurate than 
one that could be obtained from a group test. 
However, it was also used on clinical popula- 
tions and was deemed to have some predictive 
value in making differentiation between neu- 
rotic and schizophrenic groups (8). For civil- 
lian purposes, the test has also been put to 
use by Leiter and Partington (3) as a sub- 
test in an adult performance intelligence scale. 
Weaver (11) specifically applied the scale at 
the college level and found that the Trail 
Making correlated .62 with Q scores of the 
ACE. 

At the same time, Leiter and Partington 
felt that the test might have some relevance 
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with regard to the evaluation of brain in- 
jury. Watson (10) devised a qualitative 
check list for use in gauging the Pathways’ 
possible value in the evaluation of organics. 
Armitage (1), employing an altered adminis- 
tration procedure, attempted to use the Trail 
Making Test as a part of a battery designed 
specifically for the purpose of evaluating brain 
injury. He felt that the test would measure a 
number of functions such as ability to main- 
tain a double relationship, ability to plan 
foresightfully, ability to make shifts, etc. 
Using the test, he was able to show, to a 
highly significant degree, differences between 
brain damaged Ss and contrcis co.zposed of 
normals and neurotics—the controls inaking 
less errors and showing a greater ability to 
recognize and correct them. Scherer and 
Winne,® however, using the Trail Making 
over a five-year period as a part of a follow-up 
study on lobotomized patients, were not able 
to discover any relationship between Trail 
Making performance and this type of cerebral 
deficit. Humphries,‘ using a considerably dif- 
ferent administration and scoring procedure, 
applied the test in group form to organic, 
schizophrenic, and normal subjects who were 
matched on the basis of vocabulary scale 
scores. On this basis, he was able to dis- 
tinguish both the schizophrenic and organic 
groups from the normals as well as from each 
other. 

Finally, Reitan® used the test on a group 
of highly selected organic patients on which 
brain damage had been established under in- 
tensive and thorough diagnosis. Originally, he 
studied 84 subjects without evidence of brain 
damage as a control group and compared 
them with 200 subjects of known brain dam- 
age. The test was able to discriminate well 
between the groups, with only 16.7% of the 
controls showing as false positives. In a more 
rigorous follow-up study, Reitan (5) admin- 
istered the test individually to 27 patients 
with brain damage and 27 patients without 
brain damage. He obtained results which sig- 
nificantly differentiated the two groups (p 
< .001). 

81. W. Scherer and J. F. Winne. Personal com- 
munication. 1956. 


*C. C. Humphries. Personal communication. 1957. 
5 R. M. Reitan. Personal communication. 1956. 
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It seems clear, then, that Reitan’s, Hum- 
phries’, and Armitage’s findings are suggestive 
of the discriminative and fruitful application 
of the test to brain injury. However, the ap- 
proach of Reitan and Armitage involved time- 
consuming definition of brain injury as well 
as a projected use of the test for intensive 
diagnostic purposes. The present study, in- 
stead, emphasized its possibilities as a short, 
simple, and economical screening device for 
the detection of brain damage. The test was 
brought into the clinical setting and admin- 
istered to the patient population as it nor- 
mally presented itself for diagnostic evalua- 
tion, in the hope that such a procedure would 
shed additional light on the test’s value as a 
practical diagnostic tool. 


Procedure 


The Trail Making Test was routinely ad- 
ministered to all patients referred to the 
Clinical Psychology Section by the Psychiatry 
and Neurology Service, General Medical and 
Surgical Hospital, and Domiciliary, at the VA 
Center, Wadsworth, Kansas. The test was ad- 
ministered by members of the Psychology Sec- 
tion from December, 1954, to October, 1956. 


Materials 


The materials used were two test sheets, 
Part A and Part B of the Trail Making, and 
two pencils with erasers. The examiner kept a 
pencil in his right hand so that he could point 
to the spots indicated in the directions. He 
did not draw any of the lines for the examinee 
but confined himself to pointing. The test was 
administered in two parts, Part A and Part B. 
If Part A was failed, Part B was not admin- 
istered. The time required for completion of 
each part was noted. Part A required Ss to 
draw lines connecting Nos. 1 through 25 in 
their proper sequence. Part B required follow- 
ing the proper number-letter sequence, 1A- 
2B3C ...12L. The numbers as well as the 
letters in the case of Part B were enclosed in 
small circles and scattered randomly over the 
test sheets. The standard administration pro- 
cedure as described in the Army Individual 
Test Manual (7) was used. 


Trail Making Test and Screening for Brain Damage 


Scoring 


For each part passed, the time—in terms of 
seconds—was noted on the space of the test 
blank. If the S failed, a minus or failure sign 
was written and the time not recorded. 

The scores used for this study were of 
three types: (a) time in seconds to complete 
Part A, (4) time in seconds to complete Part 
B, and (c) space score B, indicating the point 
in the number-—letter sequence at which failure 
occurred. 


Subjects 


The diagnostic criterion was the final medi- 
cal diagnosis assigned by the individual physi- 
cian in charge of each case. 

The subject population for Part A of the 
test was composed of 39 organic brain-dam- 
aged patients, 14 psychotic patients, 29 other 
NP (neuro-psychiatric) patients with neuroses 
and character and behavior disorders, and 28 
normal patients. For Part B, 16 organic, 7 
psychotic, 13 other NP, and 15 normal pa- 
tients were available. The difference in num- 
ber of Ss used for each part of the test was 
due to the scoring procedure which yielded a 
time score only if the given part was com- 
pleted. 


Results 


The analysis of Ss’ performance in terms 
of seconds on Part A of the Trail Making 
Test is summarized in Table 1. In order to 


Table 1 


Summary of Analysis of Variance for Part A 
of the Trail Making Test 


Mean 

Source of variation square 
Diagnosis 11275.7 
Age 4468.9 
IQ 17104.1 
Error 1947.8 
Interaction 

Age X IO 
Interaction 

Age X Diagnosis 
Interaction 

IQ X Diagnosis 
Interaction 

Age X IQ X Diagnosis 


4563.9 
1677.7 
2872.0 
2873.0 
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Table 2 
Frequency Distribution of Scores on Part A of the 
Trail Making Test for the Various 
Diagnostic Categories 


Seconds 
(Reciprocals 
X 10,000) 


Psy- Other 


Organic _chotic NP 


41-60 
61-80 
81-100 

101-120 
121-140 
141-160 
161-180 
181-200 
201-220 
221-240 
241-260 
261-280 
281-300 
301-320 
321-340 
341-300 
361-380 
381-400 
401- 


— 


meet the assumptions underlying the Analysis 
of Variance test, it was necessary, first, to 
apply a reciprocal transformation to the raw 
scores. Thus, the data were analyzed in the 
form of reciprocals of seconds X 10,000. Also, 
inasmuch as the subclass frequencies were un- 
equal, it was necessary to apply special tech- 
niques for computing the sums of squares 
(9, p. 381). 

Significant differences were found to exist 
between the diagnostic categories and between 
the high and low IQ groups, the high IQ group . 
taking significantly less time to complete this 
part of the test. 

The means of the diagnostic groups were: 
organic 127.4; psychotic 130.5; other NP 
248.8; normal 217.6. To determine the locus 
of the significant differences obtained between 
the diagnostic groups, the means were treated 
by Duncan’s multiple range test (2). Because 
of unequal Ns, calculations were based on the 
smallest group with an N of 14. According to 
Duncans test, the minimum difference be- 
tween means for significance at the 5% level 
is no greater than 35.99. By this criterion, the 


= 
1 
1 
2 2 
F 
5.79 
2.29 >.05 
8.78 <01 
2.34 >.0S 
86 >.05 
147 >.05 
1.47 >.05 
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differences between all means are significant 
with two exceptions, occurring between the 
organic and psychotic groups on the one hand 
and the other NP and normal groups on the 
other. 

Table 2 represents the frequency distribu- 
tion of the seconds scores on Part A. 

Obviously, the large amount of overlap be- 
tween the groups, especially the organic and 
psychotic groups, rules out the possibility of 
establishing a meaningful cutting score that 
would distinguish clearly between ‘he organic 
and the other diagnostic groups of Ss in the 
present population. Treatment of S’s scores 
in terms of seconds on Part B of the test is 
outlined in Table 3. 

Due to the relatively small Ns present in 
this analysis, it was necessary to pool and 
consider as a unit all nonorganic groups of Ss, 
i.e., the psychotic, other NP, and normal 
groups. As is seen from inspection of Table 3, 
the only comparisons yielding significant dif- 
ferences were between the low and high IQ 
groups and between the two age groups. 
Again, it is clear from the frequency distribu- 
tion of these scores (Table 4) that the estab- 
lishment of a practical cutting score on the 
basis of the present data is not possible. 

Finally, in order to compare more clearly 
the present results with Reitan’s (5) findings, 
a frequency distribution comprised of coded 
scores for Part A plus Part B is given in 
Table 5. Such scores are obtained by applica- 


Table 3 
Summary of Analysis of Variance for Part B 
of the Trail Making Test 
Mean 

Source of variation df square PF p 
Diagnosis 1 3604 79 >.05 
Age 1 319.0 70 
IQ 1 31641 69 <.01 
Error 43 453.5 
Interaction 

Age X 1Q 1 94.6 21 >.05 
Interaction 

Age X Diagnosis 1 23 005 >.05 
Interaction 

IQ X Diagnosis 1 6.7 01 >.05 
Interaction 

Diagnosis X AgeXIQ_ 1 11 002 >.05 
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Table 4 


Frequency Distribution of Scores on Part B of the 
Trail Making Test for the Various 
Diagnostic Categories 


Seconds 
(Reciprocals 
X 10,000) 


Psy- Other 


Organic _chotic NP Normal 


-20 1 

21-30 1 1 
31-40 2 
41-50 
51-60 1 2 
61-70 4 
71-80 5 2 
81-90 2 

91-100 
101-110 1 1 
111-120 
121-130 
131-140 1 1 
141-150 1 
151-160 1 
161- 1 3 1 


nN 
Ww 


tion of the Army Test Manual scoring pro- 
cedure (7)—the same procedure that Reitan 
followed in his studies. Ss failing Part B were 
included in this table, in which case a score of 
zero was given for Part B. 

Reitan used a cutting score falling between 
12 and 13. Eighty-three per cent of his con- 
trol group, composed of normal and other NP 
Ss, fell above this score in contrast to only 
17% of the brain-damaged group tested. The 
present data, however, present quite a differ- 
ent picture. Inspection of Table 6 indicates an 
exorbitant number of false positives asso- 
ciated with the use of this cutting score. It is 
clear from the frequency distributions of 
Part A and Part B scores (Tables 2 and 4), 
considered separately or together, that this 
circumstance is due primarily to the inclusion, 
in the present data, of the scores of the psy- 
chotic group which closely parallel the scores 
of the organic group. s 

An additional measure, space scores, was in- 
vestigated for the purpose of testing the hy- 
pothesis that, among the Ss failing Part B, 
the organic Ss would make errors earlier in 
the number-letter sequence than would the 
nonorganic Ss. A space score simply repre- 
sented the last number correctly reached be- 


Trail Making Test and Screening for Brain Damage 


Table 5 


Frequency Distribution of Coded Scores on Part A 
plus Part B of the Trail Making Test 


Scores* Organic Nonorganic 


20 
19 
18 
17 
16 


evo 


1 


N = 39 


* Reitan’s cutting score is between 12 and 13. 


fore an error occurred. Unfortunately, due to 
the short range within which these scores may 
fluctuate, they could neither be normalized 
nor ranked properly for accurate statistical 
evaluation. Nevertheless, inspection of the fre- 
quency distribution of these scores indicated 
that this measure offered little if any advan- 
tage over the previous measures considered. 


Summary and Conclusions 


The present study was undertaken in the 
»upe of finding in the Trail Making Test a 
shorter and more economical method of screen- 
ing for organic brain damage. A number of 
earlier studies, especially those of Reitan, 
Humphries, and Armitage, strongly suggested 
the possibility of such an application for this 
test. The approach, although less rigorous 
than that used in the previously mentioned 
studies, had the advantage of more closely 
approximating the conditions under which the 
problem of practical application of the test 
could be subjected to direct inquiry, i.e., con- 
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ditions of clinical practice. In this respect, the 
findings were essentially negative. 

1. Time scores for Part A of the Test were 
found to significantly differentiate between 
the high and low IQ groups and as well be- 
tween the organics and psychotics as one 
grouping and the normals and other NP pa- 
tients as another. 

2. Part B was found to differentiate only 
between the high and low IQ groups and be- 
tween the high and low age groups. 

3. Frequency distributions yielded by both 
measures, considered separately or together 
in the form of coded scores, did not offer the 
possibility of a cutting score that would have 
discriminative diagnostic value with respect 
to the present population. 

4. An additional measure considered, space 
scores, was likewise found to Jack discrimina- 
tive value. 

Inasmuch as the diagnostic problem fre- 
quently encountered in clinical practice is one 
of distinguishing between psychosis and brain 
damage, the present findings clearly suggest 
a verdict of “no value” for the Trail Making 
Test in this context. Suggested, further, is 
that any application of the test to a popula- 
tion of the type presently considered must 
take into account factors of age and IQ. 

It is interesting to note that, contrary to 
the experimenters’ expectations, Part A seemed 
more sensitive to diagnostic categories than 
Part B. Part A certainly requires less “dis- 
tributed attention.” It seems rather to more 
closely approximate a simple speed-of-reac- 
tion task. 

Finally, it is suggested that if the merit 
and possible applications of the Trail Making 
Test are to be determined, certain areas of in- 
vestigation should be pursued. Among the 
most outstanding is the possibility for de- 
veloping methods of analyzing errors in a 
more qualitative fashion. Also, relationships 
between age, IQ, method of establishing diag- 
nosis, and Trail Making scores need further 
understanding. 
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Characteristics of Volunteers and Nonvolunteers 
in Psychological Experimentation’ 


R. M. Martin and F. L. Marcuse 
State College of Washington 


A social psychology or generalized conclu- 
sion based on experimental investigations of 
freshmen or sophomores in college is highly 
debatable. Use of volunteers from such fresh- 
men or sophomore classes is even more de- 
batable for it may represent a selection within 
a selection. In experimental work it is also a 
wise precaution to know as many character- 
istics as possible of the group being investi- 
gated. 

Studies have been made of the scores of 
volunteers on visuo-motor learning (4), the 
type of person requesting volunteers (5), 
marital adjustment of volunteers (15), as 
well as the nature of the volunteering act it- 
self (2, 9, 10, 11). Maslow and Sakoda (8) 
found volunteers to be higher in a scale of 
self-esteem than nonvolunteers and, as a re- 
sult of their findings, concluded that the 
Kinsey data on sex (6, 7) should allow for 
volunteer error. Siegman (12), however, using 
a number of personality variables, as well as 
the self-esteem scale used by Maslow, found 
no differences between volunteers and non- 
volunteers for a Kinsey-type interview. 

The present study proposed to examine the 
personality characteristics of volunteers and 
nonvolunteers for different experimental situa- 
tions. Reliability of the volunteering act was 
also to be examined. 


Procedure 


Approximately 400 introductory psychology 
students at the State College of Washington 


1The scope of this investigation was made pos- 
sible by Grant 3220-110 of the State College of 
Washington. This study was done in partial fulfill- 
ment of the requirements for the degree of master 
of science in psychology. 


were used during the different phases of this 
study. Four steps, each separated by approxi- 
mately one week, were involved. They were 
as follows: Step 1 involved the administering 
of a battery of tests. Such tests consisted of 
the Taylor Manifest Anxiety Scale (13), the 
Levinson E (ethnocentrism) Scale (1), and 
the administration of the Bernreuter Person- 
ality Inventory of which four scales were 
used (3). Scores made on the American 
Council on Education Psychological Exami- 
nation (14) were also obtained. Step 2 re- 
quested volunteers. One of the experimenters 
(Martin) met with each of 12 sections and 
presented a volunteer request sheet for one 
of four experimental situations. The situations 
were as follows: One asked for volunteers for 
an experiment dealing with learning, a second 
for personality, a third for attitude to sex, 
and the final one asked for volunteers for 
hypnosis. Three sections were used for each 
specific experimental situation. The written 
request sheet asking for volunteers was made 
as traditional as possible and is reproduced 
here verbatim: 


The Department of Psychology would like to know 
if you can be a volunteer for an experiment dealing 
with [at this point either learning, personality, atti- 
tude to sex, or hypnosis were inserted]. This experi- 
ment will take approximately two hours of your 
time. Volunteering or not volunteering will have no 
effect on your grade in this course. On the sheet be- 
low please sign your name if you can be a volunteer. 
For those of you who indicate you can be volun- 
teers, contact will be made at a later date to inform 
you of the different times available and of the place. 


Step 3 was designed to check the reliability 
of volunteering. The procedure decided on 
was as follows. A week after the original re- 
quest, the same experimenter again met the 
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Table 1 


Number of Volunteers (V) and Nonvolunteers (Nv) for Each Personality Variable for 
Different Experimental Situations 


Intelligence 
(14) 


Vv Nv 


Anxiety (13) and 
Ethnocentrism (1) 


Bernreuter Per- 
sonality Scales (3) 


57 102 
64 111 
121 213 


13 
23 
36 


62 108 
70 
132 225 


12 sections and informed them that on the 
basis of preliminary data it had been found 
necessary to redesign the original experiment 
and that it was essential to start from scratch 
(this explanation was not questioned by any- 
one). Request sheets for volunteers were 
again passed out. Apart from this modifica- 
tion the written request was exactly as above. 
The fourth and final step involved meeting 
each class and informing them that the ex- 
perimental situation for which they had vol- 
unteered would not be held. The real nature 
of the experiment was explained, results were 
promised, and they were thanked for their 
cooperation. 

The number of Ss for each experimental 


situation varies? inasmuch as the tests and 


2 The per cent of volunteers (the N’s ranged from 
84 to 95) for personality, attitude to sex, hypnosis, 
and learning was 43, 40, 38, 26, respectively. No 
differences between males and females existed ex- 
cept in the experiment calling for volunteers for atti- 
tude to sex; here males predominated. 


requests for volunteers were made at different 
times. These figures are given in Table 1. 


Results * 


Results are summarized in Tables 2 and 3. 
In Table 2, the ¢#’s are comparisons between 
the mean differences of the scores on the vari- 
ous personality tests of volunteers and non- 
volunteers for each of the four experimental 
situations as well as for the four situations 
considered as a unit. Table 3 indicates re- 
liability obtained for each of the four experi- 
mental situations as well as for the four situa- 
tions considered as a unit. 

Using the .01 level of confidence as a cri- 
terion of significance and above the .05 level 
as suggestive of significance, significant dif- 

8 Detailed data of the scores on the seven person- 
ality variables for the four experimental situations 
are contained in “Characteristics of Volunteers and 
Nonvolunteers in Psychological Experimentation,” a 


thesis by R. M. Martin, State College of Washington 
Library, 1957. 
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Situation 
Learning 
Males 9 31 9 32 12 31 
Females 12 26 13 30 il 24 
Combined 21 57 22 62 23 55 
Personality 
Males 13 22 13 24 8 24 
Females 21 22 23 24 25 23 
Combined 34 4H 36 48 33 47 
Sex 
Males 22 23 27 25 23 25 
, Females 11 33 11 32 7 31 
Combined 33 56 38 57 30 56 
Hypnosis 
Males 13 26 27 11 36 
Females 20 30 31 21 19 
Combined 33 56 58 32 55 
Total group | 
Males 54 116 
Females 64 97 | 
Combined 118 213 a 


Characteristics of Volunteers and Nonvolunteers 


Table 2 


#’s between Volunteers and Nonvolunteers on Specific Personality Traits for 
Different Experimental Situations 


Situation A> 


E* 


B2-S4 B3-I* B4-D' 


Learning 
Males 
Females 
Combined 


Personality 
Males 


Females 
Combined 


Sex 


Males 
Females 
Combined 
Hypnosis 
Males 


Females 
Combined 


Total group 


Males 
Females 
Combined 


* Intelligence (14). 

> Anxiety (13). 

¢ Ethnocentrism 

4 Self-sufficiency (3). 

* Introversion-extroversion (3). 

Dominance—submission (3). 

® Sociability (3). 

* Significant at or beyond the .01 level of confidence. 


ferences were not found in any comparison 
between volunteers and nonvolunteers for 
learning, personality, or sex. The combined 
(male and female) anxiety score, however, 
was suggestive in the experiment dealing with 
personality. In the situation requesting volun- 
teers for an experiment dealing with hypno- 
sis,* male volunteers had a significantly lower 
ethnocentrism score (less prejudiced). The 
difference between female volunteers and non- 
volunteers on this characteristic while not sig- 
nificant was in the same direction, and when 
the scores (male and female) were combined, 
volunteers had a significantly lower ethno- 
centrism score than nonvolunteers. Volunteers 
of both sexes had a higher ACE score than 


4 This phase (hypnosis) of the experiment has been 
elaborated on in the J. clin. exp. Hypnosis, 1957, 5, 
176-180. 


nonvolunteers and when combined a sugges- 
tive difference was revealed. Female volun- 
teers had a significantly higher sociability 
score than female nonvolunteers. For the total 
group, the only personality variable that stood 
out was intelligence; volunteers as a group 
had a significantly higher ACE score than 


Table 3 


Reliability of Volunteering for Different 
Experimental Situations 


Situation 


Learning 
Personality 
Sex 
Hypnosis 
Total group 
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141 1.16 1.55 54 2.05 13 2.35 
35 40 .29 2.08 98 62 
1.90 O1 1.21 42 07 1.15 1.18 
38 1.22 39 69 1.27 59 2.20 
93 2.15 15 27 43 1.19 10 
95 2.63 55 72 .27 .72 
|_| 
13 32 36 78 1.13 90 
93 88 09 1.08 17 77 88 
92 1.24 01 87 31 1.38 18 
1.47 2.03 3.39* 67 1.39 2.07 59 
1.47 09 1.47 50 1.83 1.61 3.34" \ 
2.20 1.30 3.25* 07 .26 .16 66 
1.70 Al 68 76 1.05 1.55 1.93 
2.36 55 86 A3 Al 
3.64* O1 88 16 33 1.09 1.11 
N 
89 91 
85 80 
98 67 
90 97 
= 
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nonvolunteers. Table 3 shows that reliability 
of volunteering for the different experimental 
situations varied from .67 to .97. Reliability 
for all four groups considered as a unit was 
86. 
Discussion 

When volunteers and nonvolunteers are 
each considered as a unit for the four ex- 
perimental situations, they show no differ- 
ence in six of the seven personality variables. 
The one exception to this was the finding that 
volunteers were significantly more intelligent 
(as defined by ACE score) than nonvolun- 
teers. The reason for this may be the fact 
that the more intelligent person is more in- 
quisitive and that volunteering might reflect 
this. The data indicated that female and male 
volunteers for personality have a suggestive 
higher manifest anxiety level than nonvolun- 
teers. It is possible that there is something im- 
plicit in the investigation of personality that 
attracts individuals that are more anxious and 
who feel the need of personal aid or informa- 
tion that such research might seem to offer. In 
the request for volunteers for attitudes on sex, 
no significant differences were found in any 
comparison. This finding has direct bearing on 
an issue that has been of some concern. It was 
pointed out that the publication of the Kin- 
sey reports (6, 7) brought up the question of 
volunteer error which Maslow and Sakoda 
(8) believed would bias its results. Siegman 
(12), on the other hand, found no differences 
between volunteers and nonvolunteers on such 
variables as anxiety, rigidity, defensiveness, 
nor on the Maslow scale of self-esteem, and 
concluded that the problem of volunteer error 
was not present in the Kinsey studies. The 
findings of the present study would tend to 
support the view of Siegman (although the 
reliability of the volunteering response for 
this situation [.67] may cloud the finding to 
some extent) and is not too surprising when 
one considers the myriad of ways employed 
by Kinsey in obtaining volunteers. Volunteers 
for hypnosis as a group were significantly less 
ethnocentric and had a suggestively higher in- 
telligence score than nonvolunteers. This, and 
the very fact that no significant differences 
were found in any comparison between volun- 
teers and nonvolunteers for hypnosis in self- 
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sufficiency, dominance—submission, anxiety, 
and introversion—extroversion, was considered 
important in that popular belief suggests the 
contrary. 

While the reliability of the volunteering re- 
sponse in this study was considered satisfac- 
tory, its definition should be noted. It is de- 
fined in terms of verbal indication of willing- 
ness to volunteer, and it may be argued that 
there is a gap between what individuals say 
they will do and their actual behavior. It 
should be realized that the data do not indi- 
cate that those who volunteer for one experi- 
mental situation would volunteer for another 
situation. The data only indicate the reli- 
ability of volunteering for a given situation 
and show that regardless of the experimen- 
tal situation “request” reliability was satisfac- 
tory. 

It is interesting to note that Wilcox and 
Faw (16), using different tests and describ- 
ing the psychological traits of susceptible and 
nonsusceptible hypnotic Ss (individuals who 
actually participated in an experiment rather 
than volunteers and nonvolunteers), found 
similar results to those reported here. 

Himmelstein (5) has indicated that the ap- 
pearance of the individual requesting volun- 
teers as well as the nature of the request (and 
often in conjunction with it) might be im- 
portant in determining the magnitude of the 
response. This factor in the present study was 
controlled by having the same experimenter, 
a presentable, young (24 years) white male 
make all the requests. It would be interesting 
to conjecture what the magnitude of male— 
female response might have been to requests 
for volunteers for an experiment dealing with 
sex or hypnosis if the experimenter had been 
a presentable, young white female! 


Summary and Conclusion 


This study concerned the personality char- 
acteristics of volunteers and nonvolunteers for 
different types of psychological experiments. 
Four specific experimental situations (learn- 
ing, personality, attitude to sex, and hypno- 
sis) requesting volunteers were used. Three 
sections of introductory psychology were used 
for each specific request situation. The total 
number of Ss used was approximately 400. 
The request sheets asking for volunteers were 
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traditional. Comparisons were made of differ- 
ences between volunteers and nonvolunteers 
for each and for all of the four experimen- 
tal situations on the seven personality vari- 
ables. Reliability of the volunteering act was 
checked. 

Significant differences were found on two 
of the seven personality variables between 
volunteers and nonvolunteers for one of the 
four specific request situations (hypnosis). 
Suggestive differences were also indicated. 
Reliability for the different volunteering situa- 
tions was found to vary from .67 to .97. 

The general conclusion of this investigation 
was that there were personality differences 
between volunteers and nonvolunteers associ- 
ated with different types of volunteering situa- 
tions and that generalizations made from 
biased samples can obviously be misleading. 
The general practice of using volunteers prob- 
ably owes its wide application to matters of 
expediency. However, it does not seem that 
convenience should substitute for sound ex- 
perimental procedure. 


Received December 2, 1957. 


References 


1. Adorno, T. W., Frenkel-Brunswik, Else, Levin- 
son, D. J., & Sanford, R. N. The authoritarian 
personality. New York: Harper, 1950. 

2. Blake, R. R., Berkowitz, H., Bellamy, R. Q., & 
Mouton, J. S. Volunteering an avoidance act. 
J. abnorm. soc. Psychol., 1956, 53, 154-156. 

3. Bernreuter, R. G. The personality inventory. 
Stanford, Calif.: Stanford Univer. Press, 1935. 


4. 


. Thurstone, 


479 


Brower, D. The role of incentive in psychologi- 
cal research. J. gen. Psychol., 1948, 39, 145- 
147. 


. Himelstein, P. Taylor scale characteristics of 


volunteers and nonvolunteers for psychologi- 
cal experiments. J. abnorm. soc. Psychol., 
1956, 52, 138-139. 


. Kinsey, A. C., Pomeroy, W. B., & Martin, C. E. 


Sexual behavior in the human male. Philadel- 
phia: Saunders, 1948. 


. Kinsey, A. C., Pomeroy, W. B., Martin, C. E., & 


Gebhard, P. H. Sexual behavior in the hu- 
man female. Philadelphia: Saunders, 1953. 


. Maslow, A. H., & Sakoda, J. M. Volunteer error 


in the Kinsey study. J. abnorm. soc. Psychol., 
1952, 47, 259-262. 


. Rosen, E. Differences between volunteers and 


nonvolunteers for psychological studies. J. 
appl. Psychol., 1951, 35, 185-193. 


. Rosenbaum, M. E. The effect of stimulus back- 


ground factors on the volunteering response. 
J. abnorm. soc. Psychol., 1956, 53, 118-121. 


. Rosenbaum, M. E., & Blake, R. R. Volunteering 


as a function of field structure. J. abnorm. 
soc. Psychol., 1955, 50, 193-196. 


. Siegman, A. Responses to a personality question- 


naire by volunteers and nonvolunteers to a 
Kinsey interview. J. abnorm. soc. Psychol. 
1956, 52, 280-281. 


. Taylor, Janet. A personality scale of manifest 


anxiety. J. abnorm. soc. Psychol., 1953, 48, 
285-290. 

Thelma G., & Thurstone, L. L. 
American council’ on education psychological 
examination for college freshmen. New York: 
Educational Testing Service, 1953. 


. Wallin, P. Volunteer subjects as a source of sam- 


pling bias. Amer. J. Sociol., 1949, 54, 539-544. 


. Wilcox, W. W., & Faw, V. Psychological traits 


of susceptible and unsusceptible hypnotic sub- 
jects. Paper read at Western Psychological As- 
sociation, Eugene, Oregon, May, 1957. 


6 


Journal of Consulting Psychology 
Vol. 22, AE 6, 1958 was 


“Not Alike” Responses in Wechsler's Similarities 
Subtest’ 


Murray Levine 


Devereux Foundation Institute of Research & Training, Devon, Pennsylvania 


Wiener (2) recently found that the “not 
alike” error in the Similarities subtest in- 
creased in frequency as a function of instruc- 
tions designed to create suspicion. He also 
found that characteristically suspicious Ss, as 
independently measured by a scale of dis- 
trust, had greater intellectual deficit than 
more trusting Ss. One could partially confirm 
his findings if one could show that Ss who 
gave the “not alike” response spontaneously 
had a lower mean IQ than did Ss who did not 
make that particular error. If distrust or sus- 
piciousness pervaded the test performance of 
these Ss, the mean difference in IQ would be 
greater than could be accounted for by the 
lost IQ points due to the error itself. 

The 400 Ss are veterans with a wide va- 
riety of psychiatric diagnoses who had been 
referred for psychological testing in an out- 
patient setting. 

Two hundred “not alike” Ss were selected 
from our files on the basis of the appearance 
of one or more “not alike” responses to Simi- 
larities test items of the Wechsler-Bellevue 
Form I. The criteria for the “not alike” re- 
sponse were quite stringent. Control Ss were 
selected by taking the next case in alpha- 
betical order to the man who made the error. 
Unrecorded errors that were made would act 
against the hypothesis. 

There is a highly significant difference in 
IQ in favor of the control Ss. The mean IQ 
for the control group is 109.5 (SD = 15.03), 
while the mean for the not alike group is 


1An extended report of this study may be ob- 
tained without charge from Murray Levine, Ph.D., 
Devereux Foundation Institute for Research and 
Training, Devon, Pennsylvania, or for a fee from 
the American Documentation Institute. Order Docu- 
ment No. 5693, remitting $1.25 for microfilm or 
$1.25 for photocopies. 


100.8 (SD = 14.39). The mean difference be- 
tween groups yields a ¢ value of 5.92, which 
is significant well beyond the .001 level of 
confidence. This difference in IQ is consider- 
ably greater than can be accounted for only 
on the basis of the “not alike” error. Loss due 
to “not alike” responses alone amounts to a 
maximum of three IQ points. The obtained 
loss is 8.7 IQ points. Even if we subtract 3 
IQ points from the obtained mean difference, 
using the same standard error of the differ- 
ence, the ¢ value would still be significant be- 
yond the .01 level of confidence. 

The present finding, showing that Ss who 
make the “not alike” error have a lower mean 
IQ than Ss who do not, is in partial confirma- 
tion of Wiener’s previous work with the atti- 
tude of distrust and its relationship to intelli- 
gence. Suggestively, the attitude of distrust 
serves to reduce effectiveness in intelligence 
test performance. 

Because of the substantial differences in 
IQ found in this and a previous study of a 
qualitative error (1), it is our impression that 
further detailed analysis of intelligence test 
performance in the light of current concepts 
of ego functioning will point the way to inte- 
grating theories of intelligence within a .-more 
general theory of personality. 


Brief Report. 
Received May 15, 1958. 
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Personal Adjustment, Assumed Similarity to Parents, 
and Inferred Parental-Evaluations of the Self 


Melvin Manis* 
Ann Arbor VA Hospital, Michigan 


Many clinicians have observed that the 
maladjusted individual tends to feel isolated 
and different from the important figures in his 
environment. One aim of the present study 
was to provide a quantitative test of this hy- 
pothesis. More specifically, in view of the 
importance of parental relationships, it was 
predicted that the maladjusted individual sees 
himself as being less like his parents than does 
the person who is relatively free of emotional 
problems. 

Two cases cited by Mowrer (14, pp. 531- 
535) led to the second hypothesis of the 
study. Using the semantic differential (15), 
he reported that a male patient at the outset 
of therapy rated the concepts “me” and 
“mother” as being more similar than the 
concepts “me” and “father”; at the termina- 
tion of therapy this pattern was reversed 
(i.e., “me” was rated more similar to “father” 
than to “mother’’). Similarly, a female patient 
at first saw more similarity between the con- 
cepts “me” and “father” than between the 
concepts “me” and “mother,” with this pat- 
tern showing a reversal at the termination of 
therapy. These observations suggest the hy- 
pothesis that people with emotional disturb- 
ances see themselves as being more similar 
to their parent of the opposite sex than to 
their parent of the same sex, while people who 
are relatively well-adjusted see themselves as 
being more like their same-sexed parent. 

The third hypothesis of the study was that 
the maladjusted individual perceives his par- 
ents as differing in their evaluations of him. 
There are two lines of reasoning which lead 

1This study was conducted while the author held 


a Public Health Service career-teacher grant at the 
University of Pittsburgh. 


to this prediction. Both of these formulations 
are related to Brownfain’s finding (2) that 
people with unstable self concepts, i.e., those 
who show a marked discrepancy between 
their positive (optimistic) and negative (pes- 
simistic) self-descriptions, tend to be poorly 
adjusted. From one viewpoint, perceived par- 
ental disagreement may be regarded as a 
source of anxiety which would be particularly 
potent in contributing to the instability of the 
neurotic’s self concept. On the other hand, the 
direction of causality may be reversed. That 
is, since perceived parental evaluations are 
probably colored by autistic factors, the mal- 
adjusted person may project his conflicting 
beliefs about himself onto his parents and 
conclude that their evaluations of him differ 
widely. 


Method 


Two groups of students, one well-adjusted 
and one poorly adjusted, were selected from 
an introductory psychology course; there were 
15 males and 15 females in each group. Per- 
sonal adjustment was defined in terms of 
the S’s performance on the D, Pt, Hs, Hy, L, 
and K scales of the MMPI (9). These scales 
were combined into an over-all measure of ad- 
justment by averaging the Ss’ T scores on the 
four clinical scales (including correction fac- 
tors). In selecting Ss, males and females were 
matched on the basis of this adjustment index. 
The average 7 score for the adjusted Ss was 
48.3, the average for the maladjusted Ss was 
64.1. Ss who had a raw score above 5 on the L 
scale, or who left out 17 or more items, were 
not included in these groups. In addition, no 
S was included in the adjusted group if he 
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had a T score of 60 or above on any one of 
the clinical scales. 

The Ss described their “real” selves, their 
“ideal” selves, and each of their parents on a 
series of 24 bipolar rating scales; the Ss also 
described themselves as they thought they 
were seen by each of their parents. The rating 
scales, which were presented in a seven-inter- 
val format, were derived from Cattell’s (3) 
factor analysis of the Allport-Odbert adjec- 
tive trait list (1). Cattell’s analysis resulted 
in 12 relatively independent factors. In the 
present study, the eight factors which ac- 
counted for the most variance were repre- 
sented by three scales each. The items may 
therefore be assumed to sample most of the 
important dimensions that English-speaking 
people use in describing themselves and 
others. Some typical scales were frank—secre- 
tive, immature—mature, impatient—patient, and 
intelligent-unintelligent. 

The similarity between the Ss’ self-descrip- 
tions and their descriptions of their parents 
was quantified in terms of the D statistic (6, 
16). This variable will hereafter be referred 
to as assumed similarity (AS). The similarity 
between each S’s inferred father-evaluation 


and his inferred mother-evaluation was also 
quantified using the D score. 


Results and Discussion 


Personal Adjustment and Assumed Similarity 
to Parents 


The experimental design for the assumed 
similarity data consisted of a 2 x 2 factorial 
design (adjustment X sex) in which two 
scores were available for each S: AS to the 
same-sexed parent and AS to the opposite- 
sexed parent. These data were analyzed in 
accordance with the Type III mixed design 
discussed by Lindquist? (13, pp. 231-234). 
The results of this analysis are presented in 
Table 1. 

In this analysis, each of the mean squares 
in the “Between Subjects” part of the table 
was evaluated against Error (b), while the 
mean squares from the “Within Subjects” por- 

2 Inspection of the data suggested that they did not 
depart markedly from normality. In addition, appli- 
cation of Bartlett’s test yielded a nonsignificant re- 
sult, indicating that the variance within groups was 
relatively homogeneous. 


Melvin Manis 


Table 1 
Analysis of Variance for Assumed Similarity Scores 


Source 


Between Subjects 


S’s adjustment 
S’s sex 
S’s adjustment X 
S’s sex 

Error (b) 

Within Subjects 
Parent’s sex* 
Parent’s sex X S’s 


adjustment 
Parent’s sex X S’s 


11.04 <.01 
953 <.01 


2.42 


sex 

Parent’s sex X S’s 
adj. X S’s sex 

Error (w) 


Total 119 


* AS to same-sexed vs. opposite-sexed parent. 
* Two-tailed test. 


tion were evaluated against Error (w). Sig- 
nificant Fs (p < .01) were obtained for vari- 
ance due to adjustment and for variance due 
to the S’s sex. That is, when the two assumed 
similarity scores for each S were totalled, the 
well-adjusted groups (both male and female) 
showed more assumed similarity to their par- 
ents on this composite measure than did the 
poorly adjusted groups. In the comparison by 
sex, males showed more similarity to their 
parents on the composite score than did 
females. 

The results of this analysis do not show 
any significant differences between assumed 
similarity to the same-sexed and the op- 
posite-sexed parent. This finding was obtained 
at both levels of adjustment, as may be seen 
by the nonsignificant interaction terms in 
the “Within Subjects” portion of Table 1. 

These results support the hypothesis that 
well-adjusted people show more assumed simi- 
larity to their parents than do those with 
emotional disturbances. There was, however, 
a possibility of artifact: if most of the parent- 
descriptions had been highly favorable, i.e., 
similar to the Ss’ ideal selves, well-adjusted 
Ss, having high self-esteem, would of neces- 


Mean 
59 

1 91.98 

1 20.19 

56 8.33 

60 

1 1.14 

1 1.54 

56 4.26 


Adjustment and Assumed Similarity to Parents 


sity show high AS.*° Careful examination of 
the data did not, however, support the basic 
assumptions of this view. Instead of being 
consistently favorable, the parent-descriptions 
varied widely in favorability and were, on the 
average, slightly less favorable than the Ss’ 
self-descriptions.* 

The obtained relationship between personal 
adjustment and AS is in general accord with 
previous findings (4, 10, 11, 12, 17). These 
results indicate that the maladjusted person 
typically feels that he is quite different from 
his parents. In addition, since previous re- 
search by Jourard (11) and by Fiedler, War- 
rington, and Blaisdell (7) has shown that 
high AS may reflect liking for the “other 
person,” these results suggest that the malad- 
justed person feels less warmly toward his 
parents than does the well-adjusted person. 

The hypothesis that well-adjusted Ss as- 
sume more similarity to their parent of the 
same sex than to their parent of the opposite 
sex was not supported; nor was there any 
evidence that poorly adjusted Ss assume 
greater similarity to their opposite-sexed 
parent. AS to the same-sexed parent was 
about the same as AS to the opposite-sexed 
parent for both the well-adjusted and the 
poorly adjusted Ss. It is conceivable that dif- 
ferent results would have been obtained if the 
items had been specifically selected to repre- 
sent areas in which quite different behavior 
is required of men and women. In this study, 
however, no such attempt was made; instead, 
the items were selected to provide a more 
global coverage of the Ss’ self concepts and 
their perceptions of their parents. 

The fact that women showed less AS to 
their parents than did men was unexpected. 
It may be that this finding is related to differ- 
ences between the typical parent-son and 
parent-daughter relationship. If, as Fiedler 
(8) hypothesizes, high assumed similarity can 
be interpreted as a sign of interpersonal 
warmth and closeness, these data suggest that 


8 Child (5) has noted a similar artifact possibility 
in reviewing a study by Sopchak (17). 

*The favorability of each parent-description was 
quantified by comparing it with the S’s ideal self, 
the D statistic (6, 16) being the measure of simi- 
larity. The favorability of self-descriptions was as- 
sessed in the same fashion. 
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Table 2 
Analysis of Variance for Perceived Parental Conflict 


Mean 
Source df square F 
Between groups 3 19.82 3.42 <.05 
Adjustment 1 35.27 6.08 <.05 
Sex 1 15.33 2.64 
Adjustment X sex 1 8.86 1.53 
Within groups 52 5.80 
Total 55 


* Two-tailed test. 


women feel more distant from their parents 
than do men. It should be noted that in spite 
of the obtained sex difference, the relationship 
between AS and personal adjustment held for 
both men and women. 


Personal Adjustment and Perceived Conflict 
in Parental Evaluations of the Self 


To test the hypothesis that the maladjusted 
person perceives his parents as differing in 
their evaluations of him, the discrepancies be- 
tween the inferred parent-evaluations were 
analyzed in a 2 X 2 analysis of variance (ad- 
justment X sex).®:* The results of this analy- 
sis are presented in Tabie 2. The analysis in- 
dicates that there was a significant difference 
(p < .05) between the adjusted and the mal- 
adjusted Ss in the hypothesized direction—the 
adjusted Ss showed smaller discrepancies be- 
tween their two inferred parent-evaluations 
than did the maladjusted Ss. 

The D scores, which constituted the meas- 
ure of perceived parental conflict, can be 
divided into two components. Part of the D 
is due to differences between the elevations 
of the inferred parent-evaluations. Thus, an S 
may obtain a high D score because he thinks 
that one parent respects him greatly while the 
other holds him in low esteem. Differences 
between the contents of the inferred evalua- 


5 Incomplete data made it necessary to eliminate 
one S from each of the originally selected groups. As 
a result, this analysis included data from 28 adjusted 
and 28 maladjusted Ss; there were 14 men and 14 
women within each adjustment category. 

6 Once again, the data were consistent with the as- 
sumptions underlying the analysis of variance tech- 
nique (see Footnote 2). 
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tions constitute the second component. An S 
who thinks that both parents esteem him 
equally might still show a large D because of 
his belief that his father would rate him high 
on Traits c, 5, c, and low on Traits d, e, f, 
while his mother would do the opposite. 

The adjusted Ss felt that they were more 
highly esteemed by their parents than did the 
maladjusted Ss. As a result, the adjusted Ss 
might necessarily have shown smaller “eleva- 
tion” differences, since, if the less favorable 
of their inferred parent-evaluations had been 
sufficiently high, the more favorable one 
could not have been much higher. The ad- 
justed group’s smaller D scores might there- 
fore have been due to a restriction on the size 
of their “elevation” differences. 

To rule out this possibility it was necessary 
to demonstrate that “elevation” differences 
were of equal magnitude for the adjusted and 
the maladjusted Ss. A new set of discrepancy 
scores solely reflective of “elevation” differ- 
ences was therefore computed. These scores 
were obtained by (a) calculating the D be- 
tween each parent-evaluation and the S’s 
ideal self and (5) taking the absolute differ- 
ence between these Ds. An analysis of vari- 
ance similar to the one reported in Table 2 
was then performed on the resultant measures. 

The results of this analysis indicated that 
the adjusted and the maladjusted Ss did not 
differ in terms of their “elevation” difference 
scores; in fact, the variance due to adjustment 
was smaller than the within groups variance. 
This negative finding means that the results 
of Table 2 were not due to a restriction on 
the range of “elevation” differences which 
could be obtained by the adjusted Ss. 

The negative results of this second analysis 
also mean that the maladjusted person does 
not have any marked feeling that one parent 
regards him more favorably than the other. 
His relatively large D score is apparently due 
primarily to content differences. That is, he 
may feel that while both parents hold him in 
relatively low esteem, they do so for different 
reasons. As a hypothetical example, a malad- 
justed S might believe that one parent thinks 
of him as being secretive and short-tempered, 
while the other does not accept this charac- 
terization, but does regard him as being un- 
dependable and immature. 


Melvin Manis 


While this study has related personal ad- 
justment to perceived conflict between par- 
ental evaluations of the self, the data do not 
enable us to assess the accuracy of the Ss’ 
perceptions. Subsequent research might profit- 
ably attack the question of whether there is 
a real conflict between the parental evalua- 
tions of the maladjusted person. 


Summary 


Two groups of Ss, one well-adjusted and 
one poorly adjusted, were selected from an 
introductory psychology course. The Ss de- 
scribed themselves and each of their parents 
on 24 bipolar rating scales; they also de- 
scribed themselves as they thought they were 
seen by each parent. The D statistic was used 
to quantify (a) the similarity between the Ss’ 
self-descriptions and their descriptions of 
their parents and (5) the similarity between 
each S’s two inferred parent-evaluations. 

The results supported the hypothesis that 
well-adjusted people perceive themselves as 
being more like their parents than do those 
who are poorly adjusted. There was no sup- 
port, however, for the hypothesis that well- 
adjusted people perceive themselves as being 
more similar to their parent of the same sex 
than to their parent of the opposite sex; nor 
was there any evidence that poorly adjusted 
people see themselves as being more like their 
opposite-sexed parent. An unexpected finding 
was that women perceive themselves as being 
less like their parents than do men. According 
to one viewpoint (8), these results suggest 
that women feel more distant from their 
parents than do men. 

As hypothesized, the data also indicated 
that the maladjusted Ss felt that there was a 
greater disparity between their parents’ evalu- 
ations of them than did the adjusted Ss. 
Further analysis revealed that this was not 
due to any marked feeling among the malad- 
justed Ss that one parent regarded them more 
favorably than did the other. Instead, their 
large Ds were primarily due to content differ- 
ences. For example, a maladjusted S might 
feel that his father perceived him as being 
secretive and short-tempered, while his mother 
did not accept this characterization, but in- 
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stead regarded him as being undependable 
and immature. 


8. Fiedler, F. E. The psychological-distance dimen- 
sion in interpersonal relations. J. Pers., 1953, 
22, 142-150. 
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A Comparison of the Edwards PPS Variables 
with Some Aspects of the TAT"’ 


Tom Dilworth IV* 
Southern Methodist University 


The Edwards Personal Preference Schedule 
(PPS) is based on 15 personality needs which 
are also frequently used in the evaluation of 
Thematic Apperception Test (TAT) protocols. 
It would appear, therefore, that the two tests 
seek to elicit comparable information with re- 
gard to needs. The present study investigated 
the correlation between the 15 PPS variables 
and the protocols derived from 10 selected 
pictures of the TAT. Of particular interest 
was assessment of the value of the PPS as a 
reasonable substitute for the TAT in clinical 
situations which may make such replacement 
necessary or desirable. A significantly positive 
correlation could indicate that the PPS (which 
usually requires less time and skill for inter- 
pretation than the TAT) might be used with 
reasonable confidence as an economical and 
valid substitute for the TAT in cases where 
time or skill of the interpreter is a determin- 
ing factor. 

Five qualified clinical psychologists‘ in- 

1An extended report of this study may be ob- 
tained without charge from Tom Dilworth IV, P.O. 
Box 1243, Lexington, Kentucky, or for a fee from 
the American Documentation Institute. Order Docu- 
ment No. 5742, remitting $1.75 for microfilm or 
$2.50 for photocopies. 

2 This report is based on a thesis submitted to the 
graduate school of Southern Methodist University in 
partial fulfillment of the requirements for the M.A. 
degree. The author is indebted to Robert E. Stoltz 
for advice and assistance during the course of the 
study. 

3 Now at the University of Kentucky. 

*The psychologists were William R. Garretson, 
John H. Gladfelter, Joseph H. Siegel, Aubrey E. 
Wilkinson, and Harold R. Winer. 


dependently evaluated the 10-story TAT pro- 
tocols of 20 college males with respect to the 
relative strengths of the 15 needs reflected 
by the PPS also. The null hypothesis, as fol- 
lows, was confirmed: That there is no sig- 
nificantly positive correlation between the rel- 
ative strengths of the 15 personality needs 
represented intra-individually by the PPS 
and the relative strengths of these same needs 
as indicated intra-individually by the TAT. 
The mean Spearman rank-difference cor- 
relation between the relative strengths of 
TAT protocol needs and the relative strengths 
of PPS needs was .15. This correlation would 
not indicate the feasibility of using the PPS 
as an economical substitute for the TAT. 
Only 2 of the 20 mean coefficients from which 
the over-all mean of .15 was derived were sig- 
nificantly positive (p = .01). Furthermore, in 
the case of one judge only was there sig- 
nificant agreement between the five highest- 
ranked TAT needs and the five highest- 
ranked PPS needs for each subject (p = .10). 
Reasonably high interjudge agreement on 
TAT need ranks was reflected by a mean 
Kendall coefficient of concordance of .40 (p 
= .02). Additional support for the reliability 
of the TAT judgments was provided by the 
fact that there was homogeneity among the 
judges in their ranking of TAT protocol needs. 


Brief Report. 
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The PALS Tests: A Technique for Children to 
Evaluate Both Parents’ _ 


Walter C. Williams 


University of Washington School of Medicine 


The evaluation of a parents-child relation- 
ship involves the unique interaction of three 
distinct personalities: father, mother, and 
child. The child’s behavior is commonly 
viewed as the interaction of his personality 
with his environment, which is determined to 
a great extent by the personalities and behav- 
ior of his two parents. For practical purposes, 
his behavior can be regarded as his reaction 
to his parent’s behavior as he sees and inter- 
prets it. 

When the child’s reaction becomes disturb- 
ing to his parents, the school, or society, the 
family is often referred to a child guidance 
clinic for assistance. Here trained personnel 
work as a team, trying to evaluate the child’s 
personality and behavior through interviews, 
“play sessions,” and tests, and also trying to 
evaluate parental characteristics through in- 
terviews with the mother and father. Often 
only the mother is seen, and it is she who 
supplies the developmental history, presents 
the complaint, gives her estimate of the situa- 
tion, and attempts to describe her own and 
her husband’s personality traits. 

There are several major problems involved 
in this traditional method. First, what are the 


1 This study was a portion of a doctoral disserta- 
tion completed at the University of Washington, 
1957, supported in part through research funds of 
the Department of Psychiatry, University of Wash- 
ington School of Medicine. Thanks are extended to 
Charles Strother for his valuable assistance. Thanks 
are also due to the staffs of Seattle Public School 
Guidance Department, the Archdiocese of Seattle 
Catholic School System, the Luther Burbank and 
Briscoe Memorial Schools for Boys, the Psychiatric 
Clinic for Children, University of Washington, and 
psychiatrists Benjamin Taylor and Allan Leider, all 
of whom assisted in the securing of subjects. 


important personal characteristics in the role 
of being a parent that require evaluation? 
Second, how can one avoid weighing the con- 
tribution of the mother so highly as to ignore 
the involvement of the father? Finally, how 
can one bridge the gap between the inter- 
viewer’s evaluation of the individual parents 
and the perception and reactions of the child? 

The more prominent dimensions upon which 
parental behavior can be measured are usu- 
ally some variation of the following: Autoc- 
racy—Democracy, Warmth—Coldness, and Pos- 
sessiveness—Detachment, suggested by the 
Fels group (1, 9); Authoritarian, Laissez- 
faire, and Democracy, suggested by Levin 
et al. (12); Dominant, Possessive, and Ignor- 
ing, suggested by Shoben (14). Freud (5) 
emphasized the oedipal pattern of castrating 
father with seductive:mother, while Gorer (6) 
and Wolfenstein (17) typify the American 
pattern as weak father with authoritarian 
mother. Attempts have also been made to 
relate specific childhood syndromes to pa- 
rental characteristics. Studies of schizophrenic 
children often point out the frequent occur- 
rence of a dominating, overprotective or re- 
jecting, immature mother, combined with a 
cold and intellectualizing father. Delinquent 
or “acting-out” children are often found to 
come from homes broken by death or divorce, 
where parents who are hostile or indifferent 
exercise discipline which may be classified as 
lax, rigid, or erratic. 

The second problem, i.e., relying too heavily 
on the mother’s interview, stems from the 
practical aspects of having clinic hours dur- 
ing the father’s normal working day, making 
it easier to see the mother only. Much has 
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been said of the “mother-absent” issue, both 
in the case of institutionalized children and 
the working mother, and there is increasing 
interest in the obvious absence of the father 
from the home in our current culture. A great 
many studies indicate that the absence of the 
father “makes a difference” and it seems justi- 
fied to assume that his presence makes a dif- 
ference also. Many clinic cases involve a gen- 
eral domestic discord, and the ego-defensive 
statements of the mother about her own, and 
her husband’s, involvement in the child’s 
problem cannot be taken at face value. Be- 
cause of the recognized unreliability of such 
“self-analysis,” several attempts have been 
made to standardize, in part, the mother’s 
interview (13, 14), and many clinics insist 
that the father be seen also. 

Finally, there is the difficult professional 
problem, after sorting out the relevant pa- 
rental characteristics and situational “facts,” 
of relating this material to the child’s be- 
havioral reactions. Between the environment, 
as seen by the professional interviewer, and 
the behavior, as reported and verified by ob- 
servation in the clinic, stands a perceiving 
organism: the child. The immature, at times 
even neurotic or psychotic, child may place a 
quite different interpretation on his parents’ 
behavior than would an adult, and it is to 
his interpretation that he is reacting. Parents 
have been found to differ in attitude toward 
children in such variables as ordinal position, 
sex, age levels, to mention just a few. Each 
parent-child relationship appears to be unique 
and has predictive value only in terms of the 
child’s perception of his parents as adult 
sources of need-fulfillment. 

The present study attempts to meet these 
three problems by allowing the child to 
evaluate his parents as he sees and reacts to 
them. It selects for dimensions two parental 
aspects recognized in our culture as impor- 
tant in the role of the parent: a person who 
“should or must be obeyed for some reason,” 
here termed Authority; a person who is “a 
source of warmth and emotional support,” or 
Love. Both mother and father are evaluated 
by the child in these two dimensions, high or 
low in Authority, high or low in Love, on a 
small “battery” of two tests: one a projec- 
tive, one a rating scale. The scoring system 
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is objective, having a consistent frame of 
theoretical reference which permits the two 
test forms to be compared and contrasted. 

The research involves two major steps. 
First, a technique was developed that was 
suitable for use with children, measured the 
hypothesized parental combinations, and had 
satisfactory interrater reliability. Secondly, an 
exploratory study of this new technique was 
made which established that it is a practical 
method of distinguishing between two groups 
of child behavior patterns. 


Dimensions Selected for Measurement 


The Authority and Love dimensions se- 
lected as the basic parental characteristics to 
be measured were suggested by the general 
personality theory of Leary, Coffey, et al. (4, 
7, 8, 10, 11). These researchers use a circu- 
lar continuum of personality variables, using 
these two dimensions as horizontal and verti- 
cal axes, forming a quadrant system of High 
Authority-High Love (HA-HL), Low Au- 
thority-High Love (LA-HL), Low Author- 
ity—Low Love (LA-LL), and High Authority- 
Low Love (HA-LL). Since each dimension 
can be combined with the high or low of the 
other, they are seen as relatively independent. 

In this study, the terms were simply de- 
fined as Authority meaning “one who should 
or must be obeyed for some reason” and Love 
meaning “one who is a source of warmth and 
emotional support.” Test items were selected 
which would combine the high and low of 
each term, judged by a panel of 20 experts 
in the field of parent-child relations. Selection, 
by the child, of a preponderance of statements 
that the parent was one who was regarded 
as a person who should or must be obeyed, 
placed that parent in a High Authority cate- 
gory, whereas a preponderance of statements 
prejudged as low in this quality placed the 
parent in a Low Authority category. In like 
manner, the child’s selection of High Love or 
Low Love statements placed his parent in 
these respective categories. Since every test 
item, or statement, contains these prejudged 
qualities of high or low in each of the two 
dimensions, the algebraic sum of the test 
items places the parent into one of the four 
categories: HA-HL; LA-HL; LA-LL; HA- 
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LL. Where algebraic summation on each di- 
mension showed cancellation to a near zero 
point on both, a fifth category, the Psycho- 
logically Unknown Parent, was recorded. Since 
each, child has two parents, each with five 
possibilities, the scoring system permits 25 
possible parental combinations for each child. 

Each of these five categories provides mean- 
ingful definitions for many of the parental 
characteristics suggested by the literature. 
Authority, exercised with Low Love (rejec- 
tion) resembles “authoritarianism” or “autoc- 
racy,” enforcing obedience through fear and 
punishment. Authority with High Love pro- 
vides safeguards and guidance through re- 
spect and identification, or “democracy.” Low 
Authority encompasses “laissez-faire” in its 
two aspects: with Low Love it is “ignoring”; 
with High Love it is “overindulgent” or “per- 
missive.” Excessively High Love and High 
Authority may amount to “overprotective” or 
“overpossessive.”” The Love axis tends to be 
the leavening agent for the Authority axis, 
providing a two-dimensional system that re- 
sembles Baldwin’s three-dimensional analysis 
of parent personality (1) which also. uses a 
central axis of “Warmth—Coldness,” called 
“High Love—Low Love” in this study. 


Development of the Measuring Instruments 


Several goals were set in the initial plan- 
ning of the new tests. First, a small “battery” 
was felt preferable, using the two types of 
tests found most useful in assessing interper- 
sonal relations: the rating scale and the pro- 
jective. The projective was provided with 
prejudged, forced selection answers, so that 
it could be objectively scored within the same 
theoretical framework as the rating scale. The 
cartoon form of this test provided the second 
goal: it made the material intrinsically inter- 
_esting to children. A third goal, that all words 
be geared to a third grade reading level to 
permit self-administration, was accomplished 
with the assistance of current second and 
third grade readers and Durrell’s reading lists 
(2). The tinal goal, acceptance by the three 
disciplines dealing with parent-child relations 
in a clinic setting, was accomplished by using 
seven psychiatrists, seven clinical psycholo- 
gists, and six psychiatric social workers as the 
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judges of the test items.* These 20 experts 
had test items submitted to them individually, 
were given the definitions of Authority and 
Love stated above, and were asked to rate 
each item as high or low on each dimension 
separately. The method of sequential experi- 
mentation was used with the first six raters, 
three from each discipline, called the Cri- 
terion group. These six had to agree to a cri- 
terion level of 91% to 100%, ie., 11 of the 
12 ratings, for an item to be accepted and in- 
corporated into the test. Items not meeting 
this level were eliminated and new items re- 
submitted to this panel individually until all 
items met this standard. These final items 
were then submitted, without change, to the 
additional 14 raters. Final total agreement 
with the present scoring system, including the 
Criterion group’s ratings, was 94% with the 
Rating scale form and 98% with the Projec- 
tive form. 

Each form of the test has 32 items for each 
parent, or 64 items in all. Of the 32 items in 
each scale, 8 fall into each of the four pre- 
rated categories of the system: HA-~HL; LA- 
HL; LA-LL, and HA-LL. When each axis is 
scored separately, there are 16 possible HA 
statements, 16 possible LA, 16 possible HL, 
and 16 possible LL statements. The highest 
score in any of these four “characteristics” 
would be 16, and the lowest score would be 
a selection of eight high, eight low, cancelling 
algebraically to zero. This algebraic sum for 
each axis, Authority and Love, either positive 
or negative, can be plotted on a circular grid 
to give a visual presentation of the parent’s 
position within one of the four quadrants, or 
close to the axis intersection. This last is, of 
course, the Psychologically Unknown Parent. 
This visual grid system is similar to the one 
used in the Leary-Coffey Interpersonal sys- 
tem (4) and resembles the standard four- 


2 Special thanks are due these extremely busy pro- 
fessional people who sacrificed valuable personal time 
to rate, and often rerate, items many times: psy- 
chiatrists Francis Bobbitt, Lida Brown, Ardis Candy, 
Richard Jarvis, Herbert Ripley, Glenn Strand, Ben- 
jamin Taylor; psychologists Sidney Bijou, Barbara 
Etzel, Ralph Herschstein, Otis Ramsey, Ira Steisel, 
Charles Strother, Theodore Tjossem; and social 
workers Arthur Abrahamson, Virginia Cowling, 
Clarice Hoekstra, Margaret Mykut, Carol Northrop, 
and Nancy Post. 
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quadrant grids showing ordinate and abscissa 
for the plotting of algebraic coordinates. 

The two forms, in combination, are called 
the PALS tests, short for Parental Authority— 
Love Statements.* The projective, in 16-page 
booklet form, devotes eight pages to a child 
interacting with his mother, and eight dupli- 
cate pages showing the child interacting with 
his father. Using blank-faced line drawings, 
similar to Rosenzweig’s Picture-Frustration 
Test, the child is shown in a “Need” situa- 
tion, that is, presenting a problem upon which 
the parent must take some action. The eight 
Needs shown with each parent follow Mur- 
ray’s formulation of basic physiological and 
psychological needs: Food, Sleep, Elimination, 
Overt Affection, Independence, Aggression, 
Socialization, and Succorance. These Needs 
are pictured as problem areas in the parent- 
child relationship, showing the typical “com- 
plaints” seen in clinics. Thus the Need Food 
shows a child refusing to eat certain foods, 
the Need Elimination shows a child admitting 
enuresis, the Need Aggression shows sibling 
rivalry. Four “typical” answers for the par- 
ent to make are shown below the cartoon, and 
represent the four major categories. These 
answers represent actual solutions presented 
by parents in a clinic setting when discussing 
the actual situations complained about. The 
child being tested reads instructions present- 
ing these situations as “a boy and his mother 
and father,” and is asked to circle the answer 
he thinks the mother or father would give. 
A Girl’s Booklet shows a girl in these situa- 
tions, and a Boy’s Booklet shows a boy, 
aiding identification. The projective form is 
called PEN PALS (Projected Essential Needs, 
Parental Authority—Love Statements). 

The rating scale is identical for both boys 
and girls, and is called Child’s PALS. One 
side of the page contains 32 items of a gen- 
eral personality nature appropriate to fathers 
and has space for the child to mark each item 
as “Like my Father” or “Not like Father.” 
The opposite side has 32 items appropriate 
to mothers and a similar arrangement to mark 
these statements as Like or Not like Mother. 


3 Inquiries concerning copies of the tests may be 
addressed to the author: Walter C. Williams, Uni- 
versity of Washington School of Medicine, Seattle 5, 
Washington. 
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The choice is again forced, and the items are 
prejudged into one of the four major cate- 
gories. Note, however, that the child is here 
told that he is rating his own parent, and his 
evaluative statements are much more con- 
scious than when marking the projective. For 
this reason the projective was always given 
first in the exploratory study. 


Exploratory Study 
Subjects and Administration 


To determine if the PALS tests could dis- 
tinguish between two populations, thus indi- 
cating practical diagnostic value, two well- 
defined samples were selected. The known 
difference was a matter of public record in 
some social agency: one group had been 
classed by society as “acting-out” in a hos- 
tile and aggressive manner, whereas the other 
group had not been referred for this behavior 
but were classed by their teachers as “nor- 
mal” and well-behaved, channeling their ag- 
gressions into socially acceptable competi- 
tion. A maximum control was established on 
all other variables: the two groups were 
matched as to age, sex, race, type of school 
attended, socioeconomic status of family, and 
presence of two natural parents in the home. 
Boys, aged 9} to 134, white, of normal in- 
telligence, living at home with both natural 
parents, were also matched as to type of oc- 
cupation of fathers. Fifty children were used 
in each population, or one hundred in all. 
The hypotheses dealt with predictions that 
the two groups would use the PALS tests to 
classify their parents in a different manner: 
the Acting-out group as “socially undesirable” 
people, the Normal group as “socially desir- 
able” people. 

The Acting-out children were tested first, 
then matched from the wider selection of 
Normal children, 14% from private schools, 
86% from public schools, in accordance with 
the proportions in the Seattle school systems. 
“Acting-out” was defined as having social 
agency or private practice referral for such 
socially undesirable behavior as overt aggres- 
sion, stealing, lying, destructiveness, running 
away, or other overt revolt against adult au- 
thority. Twenty-three, mostly from the lower- 
income group, had been committed by court 
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action to special schools, 23 were patients at 
the University’s Psychiatric Clinic for Chil- 
dren, and represented lower to upper middle 
class. Four were upper class, patients of psy- 
chiatrists in private practice. Normals were 
matched to this group using Type and Level 
of parental occupation, a modification of the 
Index of Status Characteristics published by 
Warner, Meeker, and Eels (15). The mean 
value indicated that the matched populations 
clustered around the skilled and semiskilled 
manual occupations. The median age for both 
groups was 11.54, with half above and half 
below this value. All were in the upper four 
elementary school years. The Normals were 
individually selected with the cooperation of 
teachers to match the acting-out group ex- 
cept for the tested variable. 

All tests were administered by the experi- 
menter who was male, white, age 42. Tests 
were administered individually in a private 
room. Standard instructions for self-adminis- 
tration were given, and reading ability veri- 
fied by having the child read the instructions 
aloud. The projective, PEN PALS, was given 
first, followed by the rating scale, Child’s 
PALS. Time ranged from 20 to 30 minutes 
for both tests, including checking to see that 
all items had been marked. 


Scoring and Evaluation of Test Results 


The scoring system, representing an alge- 
braic summation on two axes, places the in- 
dividual parent into one of the five categories. 
Since the variable being studied in the two 
groups of children dealt with the social desir- 
ability of their behavior, a similar division 
was logical in relating the category of the 
parent to the behavior of the child. The two 
categories indicating High Love were selected 
as SD, or Socially Desirable, whereas the 
three categories indicating rejection were 
classed as SU, or Socially Undesirable. The 
Authority axis was relegated to a secondary 
position, since High Authority with Love has 
favorable social connotations of providing the 
child with understanding controls, but Low 
Authority with Love, or Permissiveness, is 
also favored in our culture. Taking the par- 
ents as a “team,” both should be high in Love 
for an SD combination. Thus, of the 25 pos- 
sible parental combinations, only 4 are SD, 
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while 21 are SU because one or both parents 
are classed as rejecting. 


Results 


Twenty-three of the 25 possible parental 
combinations appeared in the 200 protocols 
of the combined groups. The two missing pat- 
terns showed the mother low on both axes, 
with the father High Authority—Low Love, 
and Low Authority-High Love. Two com- 
binations accounted for 60% of all ratings. 
Combination I, showing both parents high in 
both axes, accounted for 37% of all ratings. 
Combination II showed the father high on 
both axes, with the mother high in Love, low 
in Authority, and represented 23% of the 
ratings. All other patterns were below 5% of 
the total. 

Comparing the two populations, however, 
presented quite a different picture. Combina- 
tion I accounted for 55% of all Normals’ rat- 
ings but only 19% of the Acting-out group’s 
ratings, and Combination II was selected by 
27% of the Normals and 18% of the Acting- 
out children. In summary, these two, showing 
father high in both Authority and Love, and 
mother high in Love but not as strong in Au- 
thority, accounted for 82% of all the Normal 
children’s ratings, but only 37% of the Act- 
ing-out children’s ratings. The remaining 63% 
of this last group’s ratings tended to cluster 
around those designating the father as “Au- 
thoritarian,” i.e., high in Authority, low in 
Love, accounting for another 48%, in contrast 
to only 14% of the Normals’ ratings of father 
in this category. The Acting-out children also 
tended to class their mothers as Psychologi- 
cally Unknown, that is, either inconsistent or 
inadequate in both Authority and Love. Thus 
38% of their ratings of mother fell in this 
category, as compared to only 4% of tlie 
Normals’ ratings. 

Grouping the parental combination ratings 
for each test form, for each population, in 
terms of the previously established definition 
of Social Desirability (high in Love for both 
parents), Table 1 presents the observed data. 
Evaluating this data against an “expected” or 
statistically derived cell frequency of 8 SD, 
42 SU, the chi square values were highly sig- 
nificant (beyond the .001 level) for both Nor- 
mal and Acting-out children using the rating 
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Number of Children in the Normal and Acting-Out 

Populations Rating Their Parental Combinations as 

Socially Desirable or Socially Undesirable, Contrasting 

the Use of the Rating Scale, Child’s PALS, to the 
Projective Form, PEN PALS 


Child’s PALS PEN PALS 

SD SU N SD SU N 
Normals 49 1 50 38 12 50 
Acting-out 36 14 ».. ® 
Totals 85 15 100 49 Si 100 


sheet, Child’s PALS, and for the Normals 
using the projective PEN PALS. The data 
for the Acting-out children, using the pro- 
jective form, was not significantly different 
from a chance distribution. 

It was now possible to compare the popu- 
lations with each other, analyzing their per- 
formances on each test form, evaluating the 
major hypothesis that each form had the 
ability to distinguish the two populations. On 
the rating scale, Child’s PALS, the compari- 
son of the two groups, yielded a chi square 
value of 11.29, significant beyond the .001 
level at 1 df. On the projective form PEN 
PALS, the chi square value was even higher, 
27.05, beyond the .001 level at 1 df. The 
hypothesis was supported, therefore, that on 
each form of the test the Normals rated 
their parents as socially desirable combina- 
tions more frequently than did the Acting- 
out children. 
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It was also hypothesized that the Normal 
children would be more consistent in their 
evaluation of their individual parent, whereas 
the Acting-out children would tend to change 
more often, that is, rate a parent one way on 
the projective, another way on the rating 
scale. To test this the two forms were taken 
as a unit: either they were both used to rate 
the parents as SD, or both SU, or SD on one 
form, SU on the other. Table 2 shows the 
data recorded in this fashion, a three-way di- 
vision for the father and another three-way 
division for the mother. The term “Change” 
is used here to mean “absolute change,”’ since 
the direction, that is, which test was SU and 
which SD, is ignored. 

Each of the three columns, or “possibili- 
ties,” can be evaluated by comparing it to 
the sum of the other two. Using Fisher’s Ex- 
act Method (3), a probability of less than 
.001 was found for the difference between the 
two groups rating their parents as SD. The 
hypothesis is therefore supported that Nor- 
mal children will tend to rate both mother 
and father as Socially Desirable consistently 
on both forms of the test more often than will 
the Acting-out children. 

Using the same method, that is, combining 
the SD and Change to compare with the SU 
column, it was found that the hypothesis was 
supported that the Acting-out children tended 
to rate their parents consistently SU more 
often than did the Normals. Fisher’s method 
gave a less significant value for this type of 
“consistency”: .003 for the difference in rat- 
ing fathers and .013 for the difference in rat- 


Number of Children Rating Individual Parents as Socially Desirable on Both Forms of the PALS Tests, 
as Socially Undesirable on Both Forms, or as Socially Desirable on One 


Form, Socially Undesirable on the Other 


Fathers 


Mothers 


Both Both Change 
SD SU SD-SU 


Both Both Change 
SD SD-SU N 


Normals 39 0 11 
Acting-out 14 8 28 
Totals 53 8 39 


50 46 0 4 50 
50 21 6 23 5O 
100 67 6 27 100 
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ing mothers. Both are accepted as statistically 
significant at these levels, however. 

Combining the two “Both” columns, we 
compare “Consistency” vs. “Change.” As was 
predicted, the Normals were more consistent, 
whereas the Acting-out group more often 
tended to change the parent from SD on one 
test, then shift to SU on the other. Chi square 
for both the mother and father data was sig- 
nificant at the .001 level. All the changes 
shown for the father data were in the direction 
of SD on the Child’s PALS to SU on the PEN 
PALS, therefore no additional analysis was 
necessary. The “Change” shown for the 
mother data involved a reversal in four of 
the ratings: three for the Acting-out and one 
for the Normals. Additional tests were run to 
insure that these reversals did not affect the 
statistical significance. They were deducted 
from the “Change” value, and the difference 
was still found to be significant at the .001 
level. The reversals were evaluated by Fisher’s 
method, and a probability of .25 was found, 
i.e., not significant. The only significant dif- 
ference appears to be the more usual pattern 
of Change: rating the parent SD on the rat- 
ing scale and SU on the projective. 

As with the other data, chi square tests 
were run routinely on the two “Consistency” 
measures and the “Change” measure to see if 
they were statistically different from a chance 
distribution. Expectancy for Both SD is 8, 
for Both SU is 18, for Change is 24. All 
four populations were evaluated: fathers and 
mothers for each group. The tests for all four 
sets of data showed the SD ratings more fre- 
quent than chance expectations at better than 
the .05 level, the SU ratings less frequent than 
chance at the .01 level. The data for both 
parents, Normal group, for Change was sig- 
nificantly less than chance at the .01 level. 
The “Change” value for the Acting-out group 
for both parents was not significantly differ- 
ent from a chance distribution. 


Discussion 
The Testing Instruments 


Although it is difficult to evaluate the clini- 
cal usefulness of a testing instrument until it 
has been used with many populations over an 
extended period of time, there are several 
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practical aspects which favor the type of tech- 
nique used in this study. Administration of 
both forms takes from 15 to 30 minutes, and 
scoring takes less than 10 minutes. The stand- 
ardized scoring system, based on expert opin- 
ion, not only eliminated to a great extent the 
test-maker’s personal bias in item selection 
but will continue to function in this way to 
eliminate the administrators’ bias in test in- 
terpretation. Item selection through sequen- 
tial experimentation with representatives of 
psychiatry, clinical psychology, and social 
work tended to screen out interdisciplinary 
differences and should increase communica- 
tion and acceptance of the test results. 

Of the two forms, the projective (PEN 
PALS) had greater discriminatory power, was 
less subject to social-desirability factors, and 
was obviously more interesting to both raters 
and children. Although there is a temptation 
to use this form without the Child’s PALS, 
several factors indicate that the saving in 
time would not compensate for the loss of 
information. The more direct rating scale 
most closely resembles the type of answers 
given by a child in a series of psychiatric in- 
terviews when asked to describe his parents. 
Unlike the forced choice of the PEN PALS, 
the child is free to discard an item as “not 
like” his parent, and those he selects carry 
extra weight because of this factor. By itself 
the Child’s PALS shows discriminatory power, 
and appears to tap the “more conscious” rec- 
ognition of parental characteristics, in con- 
trast to the “less conscious” areas expressed 
in the projective form. Both forms, used as a 
small “battery,” give a more complete picture 
of the child’s overt and covert feeling about 
his parents as people as well as parents, and 
within a consistent frame of reference for 
purposes of comparison. Finally, a “Consist- 
ency” or “Change” evaluation, indicated as 
a significant factor by analysis, depends on 
the use of both forms. 


The Exploratory Study 


The use of 23 of the 25 parental combina- 
tions possible in this scoring system would in- 
dicate that classification systems limiting pa- 
rental combinations to a few cultural stereo- 
types may miss important relationships. The 
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two favored combinations, on inspection, ap- 
pear to be the most culturally acceptable, 
however, and present a measure of “conform- 
ance.” The fact that 82% of the ratings by 
Normals, in contrast to only 37% of the Act- 
ing-out group’s ratings, conformed to the cul- 
ural ideals suggests that the PALS tests are 
sampling an important aspect of the difficul- 
ties experienced by Acting-out children. The 
present study makes no claims as to where 
the “nonconformance” is located. The present 
level of interpretation places it merely in the 
behavior of the child on this test, but this 
does not preclude the strong possibiliy that 
additional investigation will move the level of 
interpretation up to the behavior of the par- 
ent. It may well be that this Acting-out child 
is using this test as just another means of 
acting-out against his parent, making it a 
sensitive behavioral sample. It may also be 
that the parent is nonconforming, at various 
levels of unconscious to conscious or covert to 
overt behavior, and that this is perceived and 
reacted to by the child in imitation and iden- 
tification. Additional studies are now in prog- 
ress to determine what 'evels of interpretation 
seem justified by the data. 

Several points require clarification. The so- 
cioeconomic status represented the less con- 
trolled data of the Acting-out group and be- 
came a selective factor in the Normals, who 
were used as matched controls. It cannot be 
interpreted to mean that “in general, Nor- 
mals and Acting-out children come from the 
same socioeconomic group.” The requirement 
that all children come from unbroken homes 
was also a selective factor for both groups. 
It became apparent, however, that it was 
much more selective for the Acting-out chil- 
dren, since it was so much more difficult to 
find children of this group who did not come 
from broken homes. The Acting-out children 
selected, therefore, were probably “more nor- 
mal” than an unselected group would be. A 
third element requires consideration: the or- 
der of test presentation. The projective form 
was always given first, and it is unknown how 
this affected the difference found in the two 
forms. 

All hypotheses were confirmed, regarding 
the difference in performance of the two popu- 
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lations studied, at a high level of statistical 
confidence. Both test forms, taken as indi- 
vidual tests and in combination to form a 
“battery,” were found to distinguish the popu- 
lations. It is obvious that the principal con- 
tribution of this study was the development 
of the technique, not a thorough analysis of 
the differences between normal and acting-out 
children, nor a complete analysis of the vari- 
ous uses of the test forms. The limited ex- 
ploratory study was used to show discrimi- 
natory power and support plans for more in- 
tensive research with additional populations. 
Such plans include studies of girls, of differ- 
ent age groups, cross-validation with other 
techniques using the child-centered view- 
point, having parents evaluate themselves and 
spouse, having expert interviewers fill out the 
test for the three individuals involved. 


Summary 

A new technique, combining a rating scale 
and projective with consistent theoretical 
framework and objective scoring system, has 
been developed as a research and applied 
clinical tool in the area of parent-child rela- 
tionships. Geared to third-grade reading level, 
it permits a child to evaluate his unique in- 
terpersonal relationship with each parent as 
he sees it. He rates each parent on two 
continua: Authority (one who should or must 
be obeyed for some reason) and Love (a 
source of warmth and emotional support). 
The “high” and “low” of each used as axes 
form a quadrant scoring system of four ma- 
jor categories, plus a fifth, the “Psychologi- 
cally Unknown” parent, defined in terms of 
algebraic cancellation. 

The rating scale, Child’s Parental Author- 
ity—Love Statements (Child’s PALS) and the 
projective, Projected Essential Needs, Paren- 
tal Authority—Love Statements (PEN PALS) 
can be visually plotted to compare and con- 
trast the conscious and less conscious evalua- 
tion of each parent. Test items were derived 
by the method of sequential experimentation, 
using 20 expert raters from the fields of psy- 
chiatry, clinical psychology, and psychiatric 
social work. Interdisciplinary agreement was 
94% on the Child’s PALS and 98% on the 
PEN PALS. 
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A limited exploratory study was made to 
determine if the PALS tests could distinguish 
between two populations of known character- 
istics in a given area. The area was a record 
of delinquent behavior, i.e., referral to some 
legal or social agency for antisocial or socially 
unacceptable behavior. Boys with such a rec- 
ord were termed Acting-out, boys without 
such record but who were normally competi- 
tive, were termed Normal. Both groups, 50 in 
each, had been living with both natural par- 
ents and were matched for age, race, socio- 
economic status, and type of school attended. 
Self-administered individual tests were given, 
the PEN PALS first. 

Results were evaluated at a child-behavior 
level: samples of a known tendency of one 
group to act-out in a socially undesirable 
manner and the other group to conform in a 
socially desirable way. Parental combinations 
were thus divided into two major groups: 
those showing both parents high in Love, 
hence Socially Desirable; those showing one 
or both parents low in Love (rejecting) or 
psychologically unknown, hence Socially Un- 
desirable. 

All predictions were supported at a high 
level of statistical significance. Twenty-three 
of the possible 25 parental combinations were 
used, with Normals showing a much higher 
percentage of the two most acceptable paren- 
tal patterns. These showed the father as high 
in both Authority and Love, with one pattern 
showing the mother also high in both, the 
other showing the mother high in Love only. 
Both forms, individually and in combination, 
showed Normals more likely to rate their pa- 
rental combinations and their individual par- 
ents as Socially Desirable and Acting-out chil- 
dren more likely to rate their parents as So- 
cially Undesirable. The Acting-out group also 
tended to change an individual parent’s clas- 
sification from undesirable on one test form 
(PEN PALS) to desirable on the other form 
(Child’s PALS) more often than did the 
Normals. 

It is concluded that since the PALS tests 
have been established as having the ability to 
discriminate between these two populations of 
boys they may also be useful in the research 
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and diagnosis of other behavior syndromes 
and other populations. 
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Cultural Symbolism: The Age Variable 


R. G. Stennett and Merle Thurlow*? 
Ontario Hospital, St. Thomas, Ontario, Canada 


Levy (1) reported that his experiment with 
62 normal school children did not support the 
Freudian hypothesis of sexual symbolism. 
Starer (2) subsequently reported a similar 
experiment with male and female psychotic Ss 
and student nurses, which gave strong sup- 
port to this same hypothesis. The present 
study attempts to discover the source of the 
discrepancy in the results of these two in- 
vestigations. 

The first part of this study consisted of a 
replication of Starer’s (2) experiment with a 
modification in the number of Ss, viz., 20 psy- 
chotic adults, 10 male and 10 female, and 25 
university students, 15 male and 10 female. 

As in Starer’s original study, Ss were asked 
to match 10 names (five male, five female) 
with 10 figures (five “male” symbols and five 
“female” symbols). 

The number of correct matches for the psy- 
chotic and university populations were sig- 
nificant (chi squares 35.47 and 52.19 respec- 
tively, p < .001). 

These results confirm Starer’s findings. 
They do not point directly to the cause of the 
Starer-Levy discrepancy, but do provide addi- 
tional support for the hypothesis of cultural 
symbolism. 

In the search for the factors which might 
have produced the discrepancy between the 
Levy and Starer results, several possible vari- 
ables suggested themselves: (a) Starer used 
individual testing, whereas Levy used a group- 
testing technique; (4) each author used stim- 
ulus figures of his own design; and (c) the 


1 The authors wish to thank E. Starer and L. H. 
Levy who kindly provided copies of the figures and 
names used in their respective studies. 

2An extended report of this study may be ob- 
tained without charge from R. G. Stennett, Chief 
Psychologist, Ontario Hospital, St. Thomas, Ontario, 
Canada, or for a fee from the American Documenta- 
tion Institute. Order No. 5741, remitting $1.25 for 
microfilm or $1.25 for photocopies. 


subject populations differed with respect to 
composition (normal vs. psychotic and nor- 
mal) and age (children vs. adults). However, 
since Starer’s results with student nurses were 
positive and since the results of our replica- 
tion of Starer’s work with university stude 
were positive, the normal vs. psychotic/dif- 
ference did not seem important. We nv rea- 
soned that the group vs. individual testing 
difference was relatively unimportant. It 
seemed, therefore, that the discrepant results 
were due to either differences in the stimulus 
figures or differences in the age of the popula- 
tions studied. The purpose of the second part 
of the study was to investigate the figure 
variable. 

Ss in this part (37 student nurses, with a 
mean age of 19) were group tested with both 
the Levy and Starer figures, with instructions 
to match the names to the figures. 

The Ss matched the Levy and Starer figures 
with male and female names better than 
chance. The chi squares for the Levy and 
Starer figures were 13.72 (p < .01) and 38.90 
(p < .001) respectively. 

Since a group-testing procedure was used 
with both Levy and Starer figures and since 
Ss matched names to both sets in a statisti- 
cally significant manner, differences in the 
stimulus figures cannot account for the dis- 
crepancy under investigation. The variable of 
age, therefore, assumes crucial importance. 
This suggests that a careful developmental 
study of cultural symbolism would reveal data 
of theoretical importance. 


Brief Report. 
Received June 13, 1958. 
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PSYCHOLOGICAL 
TEST 


New Test 


Cattell, Raymond B., Beloff, Halla, & Coan, Richard 
W. IPAT High School Personality Questionnaire 
(H.S.P.Q.). Ages 12-17 years. 2 forms. Untimed, 
(40) min. Test booklet Form A or Form B ($4.00 
per 25 or $3.50 per 2 or more packages); answer 
sheet ($1.90 per 50 or $14.90 per 500); hand scor- 
ing key (60¢); handbook, pp. 58 ($2.20), and 
tabular supplement of norms, pp. 10 (80¢); sam- 
ple set $3.10). Champaign, IIL: Institute for Per- 
sonality and Ability Testing, 1958. 

The H.SP.Q. is a thorough revision of the earlier 
IPAT Junior Personality Quiz, and is a close rela- 
tive of the Sixteen Personality Factor Questionnaire. 
Each form of the H.S.P.Q. contains 140 forced-choice 
items, 10 items for each of 14 factors. No item is 
scored for more than one factor. The 14 factors are 
drawn from the well-known factor-analytic studies 
of the senior author, and include Schizothymia vs. 
Cyclothymia, Intelligence, Neuroticism vs. Ego- 
strength, Phlegmatic Temperament vs. Excitability, 
etc. The provision of two forms is an unusual asset 
among personality questionnaires. The handbook rec- 
ommends that one form be used for survey testing 
with the second form for retesting, or that both 
forms be used for more intensive individual studies 
so as to increase reliability. 

Data in the handbook bring to sharp focus an 
issue emphasized by the test’s authors—broad cover- 
age with few items per factor versus more reliable 
coverage of a smaller number of variables. Judged 
by usual psychometric standards, the reliabilities of 
the individual factor scores are low. Retest with one 
form after two weeks yields factor reliabilities of 
from .52 to .66 with a median of 58. The correla- 
tions of the factor scores obtained from the two 
forms are even lower, from .26 to .51, with a median 
of .37. The latter data certainly raise questions about 
the equivalence of the forms and about the success 
with which the factors have been represented by the 
items. On the other hand, the construct validities, in 
terms of the multiple correlations of the items with 
the pure factors as obtained from the original factor 
analyses, are reported as fairly high, ranging from 
58 to .78 for each form, and from .73 to .88 for the 


two forms used together. Intercorrelations are geti- 
erally low, with only 5 of 91 r’s exceeding .30. 

Tentative norms in stens (1U-point standard 
scores) and deciles are based on 1,189 American 
boys and girls, ranging in age from 11 to 20, with a 
median of 14.5 years. Their geographic distribution 
is unspecified, but they seem to come mainly from 
a few midwestern cities. Tables are given for boys 
and for girls separately, and for the sexes combined. 
Age corrections are supplied for the two factor 
scores which vary significantly with age. 

The authors urge that no factor score be inter- 
preted separately. Two methods for global use are 
described, by matching profiles and by regression 
equations. Unfortunately, these usefu! methods are 
represented only by illustrative examples. There is 
no compendium of criterion profiles or of regression 
equations against external criteria. The two illustra- 
tions of the regression method—for the prediction of 
leadership and of school achievement—are so inade- 
quately described that the reader is unable to evalu- 
ate the merit of the research on which they are 
based. In contrast to the implied rigor of the regres- 
sion equation method, the handbook contains sev- 
eral pages of “clinical” suggestions, supported by no 
explicit data. 

In summary, the reviewer sees the H.SP.Q. as a 
potentially valuable instrument when further re- 
search has supplied sound actuarial data. But it is 
subject to serious misuse if the unreliable single fac- 
tor scores are given stereotyped interpretations, sup- 
ported only by the interpreter’s intuitive hunches 
about the meanings of the factors—L. F. S. 


Erratum 


The review of the Minnesota Counseling Inventory 
in the June issue of this Journal (p. 241) contained 
the misstatement that “there is no indication of item 
analysis against a criterion.” Although no detailed 
data are given concerning the item analyses, the 
Manual (p. 24) states that all items were evaluated 
against the externa! criteria of nominations and rat- 
ings, and against the internal criterion of total score 
on the appropriate scale. The Editor is grateful to 
Dr. Wilbur L. Layton for calling his attention to this 
oversight. —L. F. S. 
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Brief Reports 


The Journal of Consulting Psychology will 
accept Brief Reports of research studies in 
clinical psychology for early publication with- 
out expense to the author. The procedure is 
intended to permit the publication of soundly 
designed studies of specialized interest or lim- 
ited importance which cannot now be ac- 
cepted because of lack of space. Several pages 
in each issue will be devoted to Brief Reports, 
published in the order of their receipt with- 
out respect to the dates of receipt of the regu- 
lar articles. Most Brief Reports appear in the 
first or second issue to go to press following 
their final acceptance. 


An author who wishes to submit a Brief 
Report: 

1. Sends the Brief Report, limited to one printed 
page and prepared according to the specifications 
given below. 


2. Also sends to the Editor a full report of the re- 
search study, in sufficient detail to give a clear ac- 
count of its background, procedure, results, and con- 
clusions, which will be filed with the American 
Documentation Institute to insure indefinite avail- 
ability. 

3. Prepares at least 100 mimeographed copies of 
the full report, which the author will send without 
charge to all who request it as long as the supply 
lasts. 


4. Agrees not to submit the full report to another 
journal of general circulation. 


Specifications 

Brief Report. The Brief Report should give 
a clear, condensed summary of the procedure 
of the study and as full an account of the re- 
sults as space permits. 

To insure that the Brief Report will be no 
longer than one printed page, its typescript, 
including all matter except the title and the 
author’s lines, must not exceed 75 lines av- 


eraging 42 characters and spaces in length. 
Set the typewriter margins for short lines of 
42 characters, which are 3.5 inches long in 
elite typing, and 4.2 inches long in pica. 

The manuscript of the Brief Report must 
be double spaced throughout. Except for its 
short lines, it follows the standard style (1). 
Headings, tables, and references are avoided 
or, if essential, must be counted in the 75 
lines. Each Brief Report must be accom- 
panied by a footnote in the style below, 
which is typed on a separate sheet and not 
counted in the 75-line quota: * 


1An extended report of this study may be ob- 
tained without charge from John Doe, 300 Market 
St., Prospect 6, Mass. (giving the author’s full name 
and address), or for a fee from the American Docu- 
mentation Institute. Order Document No. ——, re- 
mitting $—— for microfilm or $—— for photo- 
copies. 


Extended report. Because the extended re- 
port is intended for photoduplication, and is 
not copy to be sent to a printer, its style 
should differ in several ways from that of 
other manuscripts: (a) The extended report 
should be typed with single spacing for 
economy in duplication. (6) Tables and fig- 
ures should be placed adjacent to the text 
which refers to them. A caption should be 
typed below each figure. (c) Footnotes should 
be typed at the bottom of the page on which 
reference is made to them. Jn other respects, 
the full report is prepared in the style speci- 
fied by the Publication Manual (1). 


Reference 


1. American Psychological Association. Council of 
Editors. Publication manual of the American 
Psychological Association (1957 rev.). Wash- 
ington, D. C.: American Psychological Asso- 
ciation, 1957. 
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_ EACH TO HIS OWN 


must follow their individual paths of development if they are to. 
attain their fullest potential. 
ae} The Devereux Schools and communities are organized to provide 
highly individualized for bays and girls from kindergarten 
The children are growped in a number of self-sufficient residential - 
units, each of which maintains its own homelike atmosphere. 


Through this program the children benefit from the individu- 
alized attention that the wnit staff can give them on a day-today 
basis, while receiving the advantages made posible by Devereux's 
Professional inquiries should be addressed to Charies J. Fowler, 
Registrar, Devereux Schools, Devon; Pennsylvania; i 
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